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Abstract 


Explicit  or  implicit,  enforced  or  not,  safety  policies  are  ubiquitous  in  software  systems.  In  the 
many  settings  where  third-party  software  is  executed  in  the  context  of  a  larger  client  program,  the 
supervisor  usually  enforces  a  safety  policy  that  prevents  the  foreign  code  from  behaving  in  ways 
that  would  disrupt  the  client,  corrupt  data  or  destabilize  the  system.  Certified  code  provides  a 
static  means  for  controlling  the  behavior  of  untrusted  programs  or  components  by  bringing  the 
power  of  type  systems  and  formal  logic  to  bear  on  the  problem.  Code  certification  systems  that 
prevent  bad  memory  accesses  and  enforce  the  abstractions  provided  by  libraries  and  runtime 
system  interfaces  have  been  well  studied. 

This  thesis  presents  a  system  for  certifying  conformance  to  timing  requirements.  The  approach 
is  simple,  comprising  an  incremental  change  to  an  existing  type  system  for  assembly  language,  but 
flexible  in  the  set  of  policies  it  can  enforce.  Moreover,  in  principle,  it  can  be  extended  to  support 
arbitrarily  complex  coding  idioms.  Focusing  on  a  particular  timing  policy  of  interest,  I  describe 
a  compiler  that  produces  certifiably  compliant  programs  with  no  help  from  the  programmer  and 
only  a  small  impact  on  runtime  performance.  Later,  I  discuss  the  applicability  of  both  the  type 
system  and  the  compilation  techniques  to  other  timing  and  resource  control  problems. 
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Chapter  1 

Introduction 


[An]  operating  system  .  ..is  a  program  that  keeps  track  of  other  programs  in  a  computer  and 
gives  each  its  due  in  space  and  time.  —  Guy  L.  Steele  Jr.  [65] 

Computers  are  useful  precisely  because  they  can  be  programmed.  The  success  of  program¬ 
ming  depends  on  the  ability  of  programmers  to  construct  sequences  of  instructions  whose  be¬ 
havior,  when  they  are  executed  faithfully  by  hardware,  is  consistent  with  some  design.  This  is 
harder  than  it  sounds,  partly  because  although  the  basic  operation  of  computer  hardware  is  deter¬ 
ministic,  programs  are  executed  in  a  complex  environment  of  simultaneously  running  processes 
multiplexed  onto  a  machine  that  usually  must  serve  multiple  purposes  for  multiple  users.  This 
arrangement  is  intended  to  improve  performance  (multiple  processes  can  make  more  effective  use 
of  a  computer's  resources),  usability  (users  like  to  have  more  than  one  application  active  at  a  time) 
and  modularity  (programs  responsible  for  different  functions  can  be  designed  independently), 
but  these  gains  cannot  be  realized  if  the  resulting  environment  is  no  longer  predictable  enough  to 
be  programmed.  To  keep  the  complexity  under  control,  we  use  operating  systems,  which  supervise 
application  programs  and  prevent  their  uncontrolled  interference  with  one  another.  The  job  of 
an  operating  system  necessarily  involves  the  setting  and  enforcement  of  safety  policies  regarding 
diverse  aspects  of  program  behavior,  including  memory  access,  privileged  instruction  execution, 
and  resource  usage. 

This  relationship  that  exists  between  an  operating  system  and  the  application  programs  run¬ 
ning  under  it,  characterized  by  the  setting  and  enforcement  of  rules  that  restrict  the  behavior  of 
each  process  for  the  benefit  of  the  whole  system,  is  not  unique  to  OS-application  interaction.  A 
similar  relationship  can  be  found  in  sophisticated  modern  software  systems  based  on  the  dynamic 
linking  and  execution  of  third-party  code,  such  as  mobile  agents,  web  applets  or  application  plug¬ 
ins:  the  principal  roles  are  the  supervisor,  which  is  an  application  or  server  process  the  computer's 
owner  trusts  implicitly,  and  one  or  more  subprocesses,  which  may  be  untrusted.  The  supervisor 
sets  up  and  enforces  specific  behavioral  rules  that  the  subprocess  must  obey;  such  a  set  of  rules 
is  called  a  safety  policy.  (For  the  purposes  of  this  thesis,  I  call  it  a  "safety"  policy  even  if  it  is  not 
a  safety  property  in  the  sense  of  Lamport  [40]  or  of  Alpern  and  Schneider  [2],  This  is  consistent 
with  the  usage  of  the  term  in  the  certified  code  literature.)  It  is  worth  noting  that  safety  policies 
are  often  complex  and  application-specific,  particularly  when  the  supervisor-subprocess  relation¬ 
ship  exists  between  two  user-level  processes,  or  between  an  application  and  an  untrusted  module 
that  runs  as  part  of  the  same  process.  The  safety  policy  of  an  operating  system  is  generally  as 
permissive  as  the  operating  system  implementor  can  allow. 

Indeed,  the  lower  the  level  of  abstraction  at  which  one  observes  the  operation  of  a  computer. 
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CHAPTER  1.  INTRODUCTION 


the  more  permissive  the  safety  policy  appears  to  be.  The  native  instruction  set  of  a  general- 
purpose  microprocessor  (such  as  Intel's  IA-32,  also  known  as  x86)  is  a  language  with  a  very  small 
number  of  types:  IA-32  has  three  different  integer  types,  three  different  floating-point  types,  and 
nothing  else.  Furthermore,  all  of  the  primitive  operations  of  the  language  —  address  calculations, 
loads,  stores  and  ALU  and  FPU  operations  —  are  syntactically  restricted  so  that  any  well-formed 
application  of  any  operation  to  any  value  has  a  well-defined  outcome  [38].  In  fact,  we  can  say 
that  as  far  as  the  processor  architect  is  concerned,  machine  language  is  type  safe  —  this  counter¬ 
intuitive  assertion  is  accurate  because  the  hardware  designer  is  not  concerned  with  any  notion  of 
safety  other  than  that  the  machine  always  behaves  according  to  its  specification.1 

When  a  hardware  implementation  of  IA-32  is  used  for  a  nontrivial  purpose,  such  as  serving  as 
the  CPU  of  a  personal  computer,  a  different  notion  of  safety  is  called  for.  Application  programs 
are  designed  in  relative  isolation  but  are  executed  in  an  unforeseeable  and  continually  changing 
context  of  other  concurrent  processes.  To  control  the  interactions  between  processes,  operating 
systems  impose  certain  requirements  on  program  behavior:  a  process  must  not  access  memory 
outside  of  its  designated  address  space,  and  it  must  not  attempt  to  perform  certain  "privileged" 
operations.  Any  violation  of  these  rules  is  detected  at  run  time  by  the  hardware,  and  the  operat¬ 
ing  system  abruptly  and  unapologetically  terminates  the  offending  process.  The  rules,  and  their 
enforcement  by  the  operating  system,  greatly  limit  the  range  of  environmental  conditions  and 
events  an  application  programmer  must  anticipate.  This  makes  it  possible  to  write  programs  that 
respond  to  these  conditions  predictably,  even  though  the  behavior  of  the  other  processes  on  the 
system  cannot  be  known  in  advance. 

As  is  well  known,  most  operating  systems  achieve  this  end  by  constant  monitoring  of  every 
running  process  to  detect  violations  of  its  safety  policy  and  forcible  correction  of  errant  behavior. 
Indeed,  for  a  long  time  conventional  wisdom  held  that  this  was  the  only  way  to  do  it:  after  all, 
conformance  to  any  nontrivial  safety  policy  is  an  undecidable  property  of  program  behavior,  so 
detecting  potential  safety  violations  before  run  time  seems  impossible.  The  advent  of  certified  code, 
however,  has  made  it  clear  that  this  inference  is  not  valid:  in  particular,  it  is  possible  to  design  re¬ 
finements  of  the  overly  permissive  type  system  of  machine  language  that  are  strong  enough  to  rule 
out  many  forms  of  unacceptable  behavior.  Furthermore,  using  these  type  systems,  static  rejec¬ 
tion  of  potentially  unsafe  programs  can  be  achieved  by  requiring  programs  to  be  accompanied  by 
additional  information  that  establishes  their  safety. 

Enforcing  safety  policies  by  certification  rather  than  run-time  monitoring  has  most  of  the  ben¬ 
efits  of  type-safe  programming  in  general.  Although  hardware  support  for  detecting  violations 
lessens  the  cost  of  monitoring  a  running  program  (catching  references  to  unmapped  memory 
pages  involves  a  considerably  smaller  overhead  on  most  hardware  architectures  than,  say,  the 
tag-checking  necessary  to  implement  dynamically-typed  languages  like  Scheme),  some  expense 
must  be  incurred  when  transferring  control  to  a  user  process  in  order  for  the  OS  to  be  ready  for 
anything  that  process  might  do.  Furthermore,  since  the  hardware  does  play  such  a  critical  role,  the 
range  of  possible  safety  policies  an  OS  can  implement  (and  consequently  the  range  of  reliability 
guarantees  it  can  make  to  application  developers)  is  limited  by  the  capabilities  of  the  hardware; 
in  contrast,  the  set  of  policies  implementable  by  certification  is  relatively  independent  of  the  hard¬ 
ware.  Finally,  the  actions  taken  by  an  OS  in  response  to  a  misbehaving  process  are  often  drastic 
and  disruptive,  and  the  unexpected  termination  of  one  process  due  to  a  safety  violation  can  often 
do  harm  to  other  processes  that  interact  with  it.  By  ruling  out  bad  behavior  before  allowing  a 
program  to  run,  an  operating  system  could  reasonably  promise  users  that  the  program  will  not  be 
prematurely  terminated. 

1In  fact,  anecdotal  evidence  suggests  that  even  this  is  too  strong  to  accurately  describe  chip  designers'  goals. 
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To  date,  advocates  of  safety  by  certification  have  focused  their  rhetoric  on  so-called  "mobile 
code",  and  even  more  specifically  “untrusted  mobile  code".  This  apparent  restriction  of  focus 
tends  to  undersell  the  technology,  because  after  all,  all  software  is  mobile  —  and  most  of  it  is 
untrusted.  It  is  very  rare  that  a  program  resides  and  runs  on  the  same  computer  from  the  mo¬ 
ment  it  is  created  until  it  is  discarded  for  good.  Furthermore,  almost  all  software  must  pass  from 
one  human  owner  to  another  at  some  point  in  its  life  cycle.  Sometimes  these  transfers  between 
individual  or  corporate  owners  are  well-controlled  commercial  transactions,  but  in  an  increasing 
variety  of  scenarios,  computers  execute  code  from  sources  their  owners  do  not  know  well.  Often 
this  is  the  result  of  an  explicit  decision  by  the  owner:  it  is  common  practice  for  users  to  download 
programs  from  unfamiliar  web  sites,  and  install  and  run  them  even  though  they  have  no  reason 
to  trust  the  authors'  intentions,  skills,  or  choice  of  development  tools.  On  occasion,  untrustworthy 
code  is  executed  at  a  user's  apparent  request,  but  as  a  result  of  confusion  or  accident:  many  e-mail 
worms  spread  this  way. 

Of  course,  a  growing  number  of  applications  involve  sending  code  from  one  machine  to  an¬ 
other  for  automatic  execution,  without  any  human  participation  in  the  process.  This,  as  opposed 
to  the  above,  is  the  phenomenon  uncontroversially  referred  to  as  mobile  code.  Most  computer 
users  have  observed  mobile  code  in  action:  ever  since  the  late  1990's,  many  World  Wide  Web 
pages  have  contained  embedded  "applets"  that  are  executed  by  browsers  with  the  aid  of  a  Java 
Virtual  Machine  implementation  [42],  and  many  if  not  most  Web  pages  today  contain  embedded 
code  in  Javascript  or  a  similar  language  to  be  interpreted  by  the  browser. 

Some  applications  of  mobile  code  use  it  in  ways  that  are  not  directly  visible  to  users.  Mo¬ 
bile  agents,  programs  that  autonomously  migrate  from  host  to  host  to  take  advantage  of  local¬ 
ized  resources,  have  achieved  buzzword  status  over  the  past  decade  or  so  [41,  7].  More  recently, 
grid  computing  has  emerged  as  a  powerful  paradigm  that  relies  heavily  on  mobile  code.  For  the 
purposes  of  this  thesis,  "grid  computing"  simply  means  large-scale  distributed  computing  on  a 
heterogeneous  collection  of  computers  connected  by  the  Internet.  This  includes  scientific  com¬ 
puting  endeavors  such  as  SETI@Home  [62]  and  Folding@FIome  [24]  as  well  as  CMU's  ConCert 
infrastructure  [9]  and  a  host  of  others  being  developed  in  academia  and  industry.  Since  the  hosts 
participating  in  a  grid  computation  are  owned  by  numerous  different  people  or  organizations 
and  located  all  over  the  world,  the  task  of  distributing  application  code  to  all  of  the  participants 
is  nontrivial.  In  the  case  of  general  grid  computing  infrastructures  like  ConCert  (as  opposed  to 
single-purpose  grids  such  as  the  SETI@Home  network),  the  automatic  transport  and  execution  of 
mobile  code  —  untrusted  mobile  code,  in  fact  —  is  essential.  Indeed,  ConCert  programs  are  de¬ 
signed  to  start  on  a  single  host  and  recruit  additional  hosts  as  they  run,  spawning  new  copies  of 
themselves  that  migrate  to  new  locations  in  much  the  same  way  mobile  agents  do. 

The  security  risks  associated  with  automatically  executing  untrusted  code  are  hard  to  over¬ 
state.  The  proliferation  of  e-mail  worms  that  infect  a  computer  when  the  user  opens  an  attached 
file  shows  that  even  well  educated  human  users  can  be  tricked  into  running  harmful  programs. 
Removing  the  user  from  the  scenario  and  executing  downloaded  software  automatically  (as  mo¬ 
bile  code  hosts  do)  without  some  kind  of  security  measure  would  clearly  be  a  disaster.  The  stan¬ 
dard  advice  to  users  regarding  e-mail  worms  —  not  to  open  attachments  unless  they  both  trust 
the  apparent  sender  and  were  expecting  the  message  —  is  hard  to  apply  to  machines  that  are  sup¬ 
posed  to  play  host  to  mobile  agents  or  ConCert  grid  programs.  After  all,  mobile  agents  arrive  for 
execution  unsolicited  and  without  warning,  and  they  often  do  not  come  directly  from  their  place 
of  origin.  Furthermore,  it  is  often  necessary  or  desirable  that  hosts  be  willing  to  execute  code 
whose  authors  they  do  not  know. 
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1.1  Certified  Code 

Static  safety  and  security  verification  of  software  has  become  increasingly  commonplace  over  the 
past  decade,  beginning  with  the  introduction  of  Java  by  Sun  Microsystems  in  1995-6  [66].  De¬ 
signed  expressly  for  network-based  computing,  Java  technology  is  based  on  an  object-oriented 
virtual  machine  (the  Java  Virtual  Machine  or  JVM  [42])  whose  high-level  bytecode  language  was 
intended  for  use  as  an  interchange  format  for  software.  Security  was  a  major  selling  point:  the 
Java  Virtual  Machine  could  be  configured  to  support  a  range  of  different  security  policies,  and 
because  the  JVM  language  was  supposed  to  be  type  safe,  a  Java  program  could  not  interact  with 
the  network  or  file  system  except  through  the  carefully  designed  interfaces  provided  by  the  vir¬ 
tual  machine.  To  ensure  that  no  Java  bytecode,  no  matter  how  malicious  or  incompetent  its  author, 
could  circumvent  the  protection  of  the  type  system,  programs  would  be  verified  prior  to  execution. 

The  central  role  of  the  Java  type  system  in  the  security  of  Java-based  software  inspired  many 
detailed  and  critical  investigations  into  whether  or  not  it  was  sound.  The  results,  published  in 
papers  with  titles  like  "Java  is  not  type-safe"  [61]  and  "Java  is  type-safe  —  probably"  [20],  showed 
mainly  that  the  Java  type  system  was  large  and  complex,  with  many  dark  corners  in  which  un¬ 
safety  might  be  overlooked.  The  resistance  of  Java  to  formal  analysis  no  doubt  helped  to  fuel 
the  trend  toward  foundationality  in  certified  code,  described  later  in  this  section.  In  the  end,  for¬ 
mal  analysis  of  Java  led  to  positive  results  for  subsets  of  the  type  system  [20,  64,  26]  as  well  as 
uncovering  a  number  of  bugs  (e.g.,  [61,  25]). 

1.1.1  Classic  Proof-Carrying  Code 

The  foundations  for  more  formal  code  certification  were  laid  in  1996-7  in  the  sequence  of  papers 
by  Necula  and  Lee  that  introduced  pro  of -carrying  code  (PCC)  [52,  50,  51].  Among  other  things, 
these  papers  established  the  basic  vocabulary  of  the  field,  including  the  terms  code  producer,  code 
consumer,  and  safety  policy.  In  this  original  approach  to  PCC  (which  I  often  call  " classic "  PCC 
to  distinguish  it  from  the  variations  that  appeared  subsequently),  an  operational  semantics  for 
a  safe  but  undecidable  subset  of  assembly  language  is  used  as  an  informal  guide  to  the  manual 
construction  of  a  program  called  a  verification  condition  generator  (VCGen).  The  VCGen  analyzes 
the  code  to  be  certified  and  computes  a  first-order  formula  that  implies  the  safety  of  the  code  (the 
verification  condition  or  VC);  a  theorem  prover  is  then  used  to  generate  a  formal  proof  of  the  VC 
which  constitutes  the  safety  evidence  for  the  program.  The  code  consumer  validates  the  certified 
binary  by  running  the  VCGen  to  extract  the  code's  verification  condition  and  then  using  a  proof 
checker  to  verify  that  the  certificate  is  a  valid  proof  of  the  VC. 

Necula  and  Lee's  early  experiments  applied  the  PCC  technique  to  packet  filters  hand-coded 
in  assembly  language  [52],  Necula's  Ph.D.  thesis  developed  certifying  compilation,  focusing  on  a 
high-level  language  called  Safe-C  [54],  Years  later,  a  very  similar  certification  infrastructure  was 
the  target  of  a  certifying  compiler  for  Java,  called  Special  J,  developed  by  Cedilla  Systems  [8].  A 
striking  difference  between  the  certification  of  hand-coded  packet  filters  and  that  of  full-fledged 
Java  programs,  other  than  the  matter  of  scale,  is  that  the  formalized  logical  discourse  (verification 
conditions  and  proofs)  of  the  early  experiments  was  conducted  at  a  very  low  level  of  abstraction, 
determined  almost  entirely  by  a  fairly  realistic  operational  semantics  for  the  machine.  The  certi¬ 
fication  in  Special  J,  by  contrast,  made  heavy  use  of  a  collection  of  language-specific  predicates, 
whose  meanings  were  defined  only  by  a  set  of  ad  hoc  axioms;  these  predicates  and  axioms  con¬ 
nected  the  instruction-by-instruction  actions  of  the  machine  code  with  the  higher-level  Java  type 
system  that  guaranteed  the  safety  of  the  source  program.  Other  researchers  would  later  question 
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the  wisdom  of  removing  the  formalized  logical  activity  so  far  from  the  hardware  level  of  abstrac¬ 
tion. 

1.1.2  Typed  Assembly  Language 

The  use  of  type  systems  for  code  certification  was  made  quite  explicit  with  the  introduction  of 
typed  assembly  language  (TAL)  by  Morrisett  et  al.  in  1999  [47].  The  original  TAL  paper  concerned 
a  language  based  on  a  generic  RISC-like  architecture  and  showed  how  various  programming  id¬ 
ioms,  including  procedure  linkage  conventions,  could  be  described  using  typing  constructs  that 
amounted  more  or  less  to  System  F  with  sums,  products  and  existential  types.  The  abstract  ma¬ 
chine  in  the  first  TAL  paper  had  no  stack;  this  technical  limitation  was  overcome  in  Stack-Based 
TAL  (STAL)  [46],  enabling  a  concrete  implementation  on  the  Intel  IA-32  called  TALx86  [45]. 

The  theory  and  the  implementation  of  TAL  had  a  good  deal  of  influence  over  certified  code 
research  that  followed,  including  the  present  thesis.  The  deep  connections  it  forged  between  con¬ 
ventional  type  theory  and  low-level  code  have  allowed  ever  more  complex  and  powerful  type 
systems  to  be  brought  to  bear  on  code  certification  problems.  In  addition,  the  implementation  of 
TALx86  has  been  used  in  a  number  of  subsequent  research  endeavors  and  has  even  proven  robust 
enough  to  serve  as  the  target  of  a  full-scale  ML  compiler  [56]. 

1.1.3  Foundationalism 

Classic  PCC  and  TALx86  introduced  a  certain  degree  of  formalism  to  code  certification,  but  it  was 
soon  apparent  that  there  was  room  for  more.  In  2001,  Appel  et  al.  kicked  off  the  trend  of  foun¬ 
dationalism  in  certified  code  by  observing  that  despite  the  intentions  of  Necula  and  Lee  and  of 
Morrisett  et  al.  to  base  their  systems  on  sound  type  theory  and  logic,  the  security  of  these  systems 
depended  on  the  correctness  of  large  amounts  of  code  and  ad  hoc  theory  that  were  verified  only 
informally  or  not  at  all.2  Specifically,  Appel  noted,  it  would  have  been  "a  daunting  task"  to  metic¬ 
ulously  check  every  aspect  of  the  type  safety  proof  for  TALx86  (even  though  the  simpler  abstract 
TAL  had  been  subjected  to  rigorous  scrutiny),  let  alone  to  formally  verify  the  softivare  implenting 
TAL's  type-checker  or  PCC's  VCGen  [3]. 

Appel  et  al.'s  foundational  pro  of -carrying  code  (FPCC)  aims  to  place  code  certification  on  a 
sound  formal  footing  by  reducing  as  much  as  possible  the  trusted  computing  base,  that  body  of 
code  and  theory  that  one  needed  to  trust  in  order  to  believe  in  its  security  guarantees.  The  "proof" 
attached  to  an  FPCC  program  is  nothing  other  than  a  machine-checkable  proof  (in  higher-order 
logic)  of  the  proposition  that  that  program  obeys  the  safety  policy.  The  main  problem  with  this 
enterprise  was  one  of  scale:  how  in  the  world  could  a  proof  of  safety  for  a  practical-sized  program 
be  automatically  generated,  let  alone  represented  compactly  enough  to  be  transmitted  with  mobile 
code  over  a  network  and  verified  in  a  reasonable  amount  of  time? 

A  type  system  for  machine  language  played  a  central  role  in  the  solution  developed  by  Appel 
et  al.  They  defined  a  semantic  model  of  the  types  in  their  system,  which  amounts  to  defining  a  logic 
predicate  for  each  type  characterizing  the  structure  of  values  of  that  type  [4],  The  model,  together 
with  a  formal  specification  of  the  operational  semantics  of  the  machine,  allowed  the  soundness  of 
each  typing  rule  to  be  proven  as  a  lemma.  A  proof  of  safety  for  a  given  program  could  then  be 
constructed  from  a  typing  derivation  for  that  program,  replacing  application  of  typing  rules  with 
applications  of  the  corresponding  lemmas. 

2"Ad  hoc"  is  not  pejorative  here.  The  basic  principles  of  type  theory  were  considered  to  rest  on  solid  ground;  it  was 
the  metatheoretic  work  concerning  the  specific  properties  of  PCC  and  TAL  that  was  potentially  in  question. 
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Constructing  formal  semantic  models  and  proving  them  correct  in  machine-checkable  higher- 
order  logic  is  not  easy,  and  was  perceived  as  the  weak  link  in  FPCC.  In  2002,  Hamid  et  al.  [28, 
29]  proposed  an  alternative:  instead  of  formalizing  proofs  of  type  safety  in  what  amounted  to 
a  denotational  semantics  for  a  low-level  type  system,  they  formalized  a  syntactic  safety  proof  in 
the  style  of  Wright  and  Felleisen  [71].  This  style  of  proof  was  already  known  to  be  much  more 
tractable  than  model-theoretic  soundness  proofs.  Once  the  soundness  of  the  system  as  a  whole 
was  proven,  the  proof  obligation  for  any  specific  program  was  reduced,  as  before,  to  showing  that 
it  was  well-typed. 

The  proofs  in  the  syntactic  FPCC  of  Hamid  et  al.  were  encoded  in  the  Calculus  of  Inductive 
Constructions  [55].  The  next  year,  Crary  unveiled  Talt,  a  "foundational  typed  assembly  lan¬ 
guage"  with  a  machine-checkable  safety  proof  encoded  in  the  Twelf  meta-logic  [14,  13].  As  I  will 
explain  later  on,  Talt  was  not  only  the  most  advanced  and  expressive  type  system  for  machine 
language  to  date,  but  served  a  higher  purpose  as  the  underlying  type  system  of  the  first  imple¬ 
mented  instance  of  a  metalogical  foundational  certified  code  framework.  As  this  framework  pro¬ 
vides  the  context  for  all  the  technical  work  in  this  thesis,  my  first  order  of  business  will  be  to 
describe  its  operation;  this  exposition  makes  up  Chapter  2  of  the  thesis. 

1.1.4  Static  Safety  in  Operating  Systems 

The  notion  of  enforcing  safety  policies  by  static  rather  than  dynamic  means  seems  to  have  re¬ 
ceived  only  sporadic  interest  from  the  operating  systems  community  until  fairly  recently.  This  is 
presumably  due  to  the  historical  focus  on  performance  as  the  primary  goal  of  operating  systems 
research  and  a  widespread  belief  that  static  policy  enforcement  hurts  performance;  in  recent  years, 
however,  security  has  emerged  as  an  issue  of  paramount  concern  [67], 

The  most-cited  example  of  an  operating  system  leveraging  the  static  safety  properties  of  a 
programming  language  is  SPIN  [6],  an  extensible  operating  system  that  allows  applications  to 
extend  the  kernel  with  specialized  paging  algorithms,  network  protocol  implementations,  and 
other  performance-enhancing  modules  written  in  the  type-safe  Modula-3  language.  As  noted,  the 
original  PCC  experiments  conducted  by  Necula  and  Lee  focused  on  the  ability  to  link  untrusted 
code  into  an  operating  system  kernel  safely. 

Singularity  is  an  operating  system  under  development  at  Microsoft  Research  [37],  It  differs 
from  SPIN  in  that  not  only  the  microkernel  and  all  device  drivers  and  system  services,  but  all  user- 
level  applications  as  well,  must  be  written  in  a  type-safe  language  and  verified  prior  to  execution. 
In  defense  of  this  somewhat  radical  shift,  the  Singularity  team  conducted  some  measurements 
of  the  costs  of  hardware-based  dynamic  versus  software-based  static  isolation  of  processes,  the 
results  of  which  led  them  to  declare  that  the  benefit  of  eliminating  the  overhead  of  dynamic  safety 
policy  enforcement  is  significant  [1], 

Admittedly,  the  dynamic  approach  has  a  substantial  head  start,  and  may  have  influenced  soft¬ 
ware  and  hardware  design  too  much  for  static  approaches  to  be  adopted  in  commercial  operating 
systems  any  time  soon.  On  the  other  hand,  type-based  certification  makes  this  otherwise  impossi¬ 
ble  idea  a  potentially  viable  alternative  and  has  not  existed  for  very  long  compared  to  the  amount 
of  time  the  systems  community  has  spent  engineering  systems  based  on  the  dynamic  approach. 
Whether  these  recent  developments  are  the  leading  edge  of  a  major  change  in  computer  software 
architecture  remains  to  be  seen. 

Even  if  the  policies  of  mainstream  operating  systems  continue  to  be  dynamically  enforced  for 
the  foreseeable  future,  many  applications  that  rely  on  untrusted  third-party  code  have  policies  of 
their  own  that  they  must  enforce.  Furthermore,  these  policies  might  be  concerned  with  aspects 
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of  program  behavior  over  which  the  operating  system  does  not  provide  direct  control,  or  appli¬ 
cations  may  require  a  finer  degree  of  control  than  the  operating  system  offers.  In  situations  like 
these,  code  certification  can  provide  a  means  to  enforce  the  application-specific  policies. 

1.2  Timing  Properties  for  Safety 

In  order  for  all  of  the  processes  on  a  computer  system  to  function  properly,  it  is  important  that 
each  of  them  be  given  enough  time  in  which  to  run.  In  addition,  for  applications  requiring  user 
interaction  or  real-time  control  of  devices,  it  is  important  that  the  intervals  during  which  a  given 
process  is  not  running  be  short  enough  to  allow  that  process  to  react  to  events  in  a  timely  manner. 
To  address  these  requirements,  operating  systems  manage  the  scheduling  of  processes.  Time  is 
allocated  to  processes  in  quanta  known  as  time  slices:  at  the  start  of  a  time  slice,  the  OS  hands  over 
almost  total  control  of  the  CPU  to  a  user  process  with  the  understanding  that  it  will  only  keep 
that  control  for  a  certain  amount  of  time.  At  the  end  of  that  interval,  if  the  user  process  has  not 
performed  any  action  to  return  control  to  the  operating  system,  it  is  preempted ;  the  state  of  the 
process  is  saved  and  it  becomes  dormant  until  the  system  chooses  to  give  it  another  time  slice. 

It  is  usual  to  view  this  multiplexing  of  processes  onto  the  CPU  as  a  resource  allocation  task 
performed  by  the  operating  system.  However,  it  can  also  be  viewed  as  a  part  of  the  problem 
of  enforcing  rules  of  behavior  for  processes.  As  I  have  said,  the  operating  system  must  set  up 
and  enforce  behavioral  rules  that  programs  running  on  the  machine  must  obey.  The  requirement 
that  a  program  not  keep  control  of  the  CPU  for  too  long  at  a  time  is  not  fundamentally  different 
from  the  requirement  that  it  access  only  its  own  memory  pages  or  that  it  refrain  from  performing 
privileged  instructions;  in  all  of  these  cases,  one  program's  failure  to  comply  with  the  operating 
system's  policy  might  affect  other  programs'  behavior  in  ways  the  other  programs'  authors  could 
not  have  anticipated.  Therefore,  I  choose  to  see  pre-emption  as  an  enforcement  mechanism  for  timing 
policies.  The  policy  states  that  no  program  shall  keep  continuous  control  of  the  CPU  for  more  than 
the  length  of  one  time  slice,  and  the  enforcement  mechanism  relies  on  detecting  violations  at  run 
time  and  forcibly  correcting  an  offending  program's  behavior. 

It  is  natural  to  ask,  then,  whether  it  is  possible  to  enforce  timing  policies  such  as  this  one  stat¬ 
ically.  Is  it  possible  to  use  code  certification  to  guarantee  before  running  a  program  that  it  does  not 
violate  the  policy?  If  it  is  possible,  then  doing  so  could  relieve  operating  systems  of  the  responsi¬ 
bility  of  enforcing  the  rules  by  detecting  violations  at  run  time  —  if  a  program  passes  the  necessary 
verification  prior  to  running,  there  will  be  nothing  to  detect.  Just  as  the  static  process  isolation  in 
Singularity  eliminates  the  need  for  hardware-supported  memory  protection,  static  enforcement  of 
timing  policies  could  eliminate  the  need  for  preemption. 

Simplifying  operating  system  implementation  is  only  the  beginning.  Fundamentally,  an  oper¬ 
ating  system  must  regard  non-certified  programs  with  unreserved  suspicion.  They  must  be  treated 
as  adversaries  and  firmly  regulated  in  order  to  keep  the  system  as  a  whole  secure.  Certified  pro¬ 
grams,  on  the  other  hand,  can  be  expected  to  play  fair,  and  can  therefore  be  given  some  freedom 
to  control  their  own  execution.  A  certified  program  can  be  trusted  to  choose,  within  appropriate 
limits,  exactly  when  it  will  yield.  Consequently,  it  can  be  given  the  responsibility  of  saving  its  own 
state  before  a  context  switch  -  which  it  knows  how  to  do  better  than  the  OS  can  -  and  can  arrange 
to  avoid  yielding  at  inconvenient  times,  such  as  inside  short-lived  loops  where  the  loss  of  data 
from  the  cache  might  be  particularly  disruptive. 

Certification-based  enforcement  of  yielding  policies  is  consistent  with  the  so-called  "pay-as- 
you-go"  principle,  which  encourages  the  design  of  systems  in  which  one  does  not  incur  costs 
for  unused  features.  The  overhead  of  preemptive  multitasking  is  generally  understood  to  be  a 
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significant  but  unavoidable  cost  of  guaranteeing  the  stability  and  reliability  of  a  computer  system. 
Many  programs,  though,  such  as  those  that  perform  a  lot  of  blocking  1/ O  operations  or  spend  a 
lot  of  time  waiting  for  GUI  events,  tend  not  to  hold  on  to  the  CPU  for  very  long.  The  overhead  of 
preemption  would  be  unnecessary  for  such  programs,  if  only  the  scheduler  could  identify  them 
with  certainty.  In  traditional  systems  with  non-certified  executables,  this  clearly  cannot  be  done. 
However,  if  programs  were  known  in  advance  to  be  cooperative,  the  overhead  of  arranging  for 
preemption  by  a  timer  interrupt  that  will  never  occur  could  be  avoided.  Certification  of  timing 
policies  would  make  this  possible. 

The  idea  of  enforcing  timing  policies  with  code  certification  goes  back  all  the  way  to  Necula 
and  Lee,  who  observed  that  PCC  could  be  used  to  guarantee  bounded  running  time  of  programs 
[51].  In  Crary  and  Weirich's  type  theory  LXres  and  assembly  language  TALres,  the  type  of  a  func¬ 
tion  specifies  its  running  time,  often  as  a  function  of  the  structure  of  its  arguments  [18].  More  re¬ 
cently,  Naik  [49]  described  a  Typed  Interrupt  Calculus,  a  core  language  for  interrupt  programming 
whose  type  system  guarantees  that  all  interrupts  will  be  handled  before  their  associated  dead¬ 
lines.  To  date,  none  of  these  approaches  has  produced  a  workable  solution  for  general-purpose 
programming  on  a  realistic  scale.  Nonetheless,  the  basic  structures  needed  for  reasoning  about 
time  introduced  by  Necula  and  Lee  and  picked  up  by  Crary  and  Weirich  and  by  Naik  form  the 
basis  for  my  work  as  well. 

Some  timing  properties  can  be  enforced  using  a  type  system  based  on  linear  logic.  In  particular, 
Hofmann  has  shown  that  any  program  in  a  certain  linear  A-calculus  denotes  a  polynomial-time 
function  [35].  This  is  a  weaker  property  than  any  realistic  safety  policy,  since  even  a  constant- time 
function  can  take  longer  to  finish  than  a  supervisor  is  willing  to  wait.  Also,  the  linear  type  system 
is  not  particularly  user-friendly.  The  real  advantage  of  Hofmann's  calculus  is  that  it  controls  space 
usage  of  programs  by  forbidding  them  to  allocate  new  storage;  the  linear  typing  discipline  allows 
preallocated  space  to  be  reused  in  a  type-safe  way.  Hofmann  and  Jost  later  showed  how  to  allow 
a  limited  amount  of  allocation  while  controlling  the  total  memory  usage  of  programs  [36].  The 
issue  of  space  usage  was  also  addressed  by  Aspinall  et  al.  in  a  logic  for  reasoning  about  resource 
usage  in  a  fragment  of  JVM  bytecode  [5]. 

1.3  Thesis  Overview 

My  claim  in  this  thesis  is  that  static  enforcement  is  possible  for  a  wide  range  of  timing  policies. 
Indeed,  I  claim  that  the  complexity  of  enforcing  this  policy  is  a  small  increment  over  that  of 
enforcing  memory  safety.  Finally,  I  claim  that  certifiable  adherence  to  the  policy  presents  no 
burden  to  most  application  programmers  and  requires  only  a  modest  contribution  from  the 
implementors  of  the  development  tools  they  use. 

The  technical  content  of  this  thesis  begins  in  Chapter  2  with  an  overview  of  the  Crary-Sarkar 
metalogical  code  certification  framework  and  the  Talt  type  system.  The  next  five  chapters  com¬ 
prise  a  case  study  in  certifying  compliance  with  a  specific  timing  policy.  Chapter  3  describes  this 
policy,  called  responsiveness,  and  the  type  system  called  Talt-r  that  I  developed  to  certify  pro¬ 
grams  that  obey  it.  Chapter  4  explores  the  metatheory  of  a  key  subsystem  of  Talt-r. 

The  generation  and  certification  of  responsive  programs  is  laid  out  in  the  next  three  chapters. 
To  support  my  claim  that  certifiable  responsiveness  presents  no  burden  to  most  programmers,  I 
consider  the  problem  of  generating  compliant  binaries  without  the  benefit  of  any  timing-related 
input  from  the  programmer,  because  programmers  of  traditional  preemption-based  systems  are 
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not  accustomed  to  providing  any.  Chapter  5  describes  a  timing-ignorant  typed  intermediate  lan¬ 
guage  that  will  serve  as  the  source  language  in  my  discussion  of  compilation.  In  Chapter  6  I 
covers  the  basic  techniques  I  have  worked  out  for  making  programs  certifiably  responsive,  and 
Chapter  7,  which  gives  a  formal  translation  from  the  language  of  Chapter  5  to  Talt-R. 

Chapter  8  discusses  the  application  of  the  ideas  in  Talt-R  to  certification  of  other  timing  poli¬ 
cies  and  to  other  resource  control  problems.  In  Chapter  9  I  present  the  results  of  some  empirical 
experiments  with  Talt-R,  discuss  possible  directions  for  further  work  on  this  subject,  and  give 
my  final  conclusions. 
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Chapter  2 

TALT  Background 


A  language  design  can  no  longer  be  a  thing.  It  must  be  a  pattern  —  a  pattern  for  growth  — 
a  pattern  for  grozving  the  pattern  for  defining  the  patterns  that  programmers  can  use  for  their 
real  work  and  their  main  goal.  —  Guy  Steele  [65] 

The  code  certification  machinery  I  have  designed  to  enforce  timing  policies  is  based  on  the 
existing  body  of  work  by  Crary  and  Sarkar  on  a  so-called  metalogical  approach  to  foundational 
certified  code  [15],  including  the  type  system  TALT  [14].  In  this  chapter  I  review  the  basic  ideas 
necessary  to  understand  what  this  means,  and  sketch  the  type  system  and  semantics  of  TALT. 
Readers  familiar  with  Crary  and  Sarkar 's  work  may  find  that  this  chapter  is  mostly  review;  how¬ 
ever,  it  establishes  the  meanings  of  several  terms  that  I  will  use  throughout  the  remainder  of  the 
thesis. 

2.1  Metalogical  Foundational  Certified  Code 

The  metalogical  approach  to  foundational  certified  code  was  developed  concurrently  with  the 
Talt  type  system,  and  the  needs  of  each  influenced  the  design  of  the  other.  It  is  a  mistake,  though, 
to  think  of  the  formalized  metatheory  developed  by  Crary  and  Sarkar  simply  as  "the  Talt  safety 
proof,"  however  common  it  may  be  to  call  it  that.  The  intent  was  much  more  general:  to  formalize 
a  single,  foundational  safety  policy,  in  a  logic  capable  of  proving  the  safety  of  as  wide  a  variety  of 
programs  as  practical.  The  preferred  way  to  structure  safety  proofs  for  programs  was  to  base  them 
on  type  systems;  Talt  is  merely  one  point  in  the  vast  design  space  of  type  systems  for  machine 
language  whose  safety  can  be  proven  within  this  framework.  It  serves  (among  other  purposes)  to 
demonstrate  how  type  safety  proofs  in  the  metalogical  framework  may  be  constructed  and  as  a 
starting  point  for  the  design  of  more  expressive  or  more  specialized  type  systems. 

2.1.1  LF,  Elf  and  Twelf 

The  "metalogical"  aspect  of  Crary  and  Sarkar 's  approach  to  foundational  certified  code  lies  in  its 
use  of  the  Twelf  metalogic  for  the  expression  of  the  safety  policy  and  the  statements  and  proofs 
of  all  safety  theorems.  It  appears  to  be  common  in  colloquial  speech  among  users  of  the  Twelf 
system  to  say  that  these  definitions  and  proofs  are  conducted  "in  LF,"  but  my  opinion  is  that  this 
leads  to  confusion.  For  the  purposes  of  this  thesis,  therefore,  I  make  the  following  definitions. 

LF,  the  Edinburgh  Logical  Framework,  is  either  the  type  theory  first  given  that  name  by  Flarper, 
Honsell  and  Plotkin  [31]  or  its  subsequent  reformulation  by  Harper  and  Pfenning  [32],  LF  is 
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instantiated  by  specifying  a  set  of  uninterpreted  constants,  each  with  a  type  or  kind  as  appropriate; 
this  information  comprises  a  signature.  The  LF  type  theory  itself  is  too  weak  to  be  of  practical 
interest  without  a  nontrivial  signature. 

Elf  is  a  logic  programming  system  based  on  LF.  An  LF  signature  is  treated  as  a  logic  pro¬ 
gram;  Elf  attempts  to  answer  queries  of  the  form  "Is  there  a  substitution  of  closed  terms  for  the 
free  variables  in  A  such  that  the  resulting  type  is  inhabited?"  by  conducting  a  search  based  on 
unification  and  backtracking.  The  name  "Elf"  is  rarely  heard  nowadays,  since  the  software  imple¬ 
menting  it  was  expanded  considerably  and  renamed  "Twelf "  circa  1999.  Nevertheless,  I  will  refer 
to  LF  signatures  (or  fragments  thereof)  that  are  intended  to  be  executed  by  the  logic  programming 
interpreter  as  Elf  logic  programs. 

Twelf  is  the  current  generation  of  the  Elf  software  package  [59].  It  includes  all  of  the  logic 
programming  functionality  of  Elf,  plus  the  capability  to  check  that  a  logic  program  is  total  (roughly, 
that  all  queries  of  a  certain  form  have  answers).  In  addition,  it  includes  a  theorem  prover  capable 
of  proving  facts  about  the  existence  of  LF  terms  of  particular  types  —  however,  this  facility  is  not 
used  at  all  in  the  Crary-Sarkar  certified  code  methodology,  so  I  will  not  discuss  it  further. 

The  Twelf  meta-logic  is  the  method  of  using  the  totality  checker  of  Twelf  as  a  proof  assistant 
[30].  Under  a  slight  variation  of  the  well-known  programs-as-proofs  correspondence,  a  total  logic 
program  is  essentially  a  constructive  proof  of  a  "theorem"  in  that  it  shows  how,  given  closed  LF 
terms  of  certain  types,  to  find  closed  terms  of  other,  related  types.  Since  such  theorems  are  about 
the  existence  of  LF  terms,  and  LF  is  usually  instantiated  to  coincide  with  some  logic  of  interest, 
the  theorems  are  called  meta-theorems  and  the  programs  that  prove  them  meta-proofs.  Note  that, 
although  proofs  do  take  the  form  of  programs,  the  fact  that  they  are  logic  programs  (which  are  sets 
of  constants  with  declared  types  and  kinds)  rather  than  functional  programs  (which  are  simply  A- 
terms)  means  that  the  propositions  they  prove  do  not  precisely  correspond  to  anything  in  the  LF 
type  theory  itself  —  certainly  not  to  types  as  in  the  familiar  Curry-Howard  isomorphism. 

2.1.2  The  Metalogical  Skeleton 

The  basic  operation  of  the  metalogical  certification  framework  is  as  follows.  The  two  principals 
involved  in  the  use  of  a  certification  system  are  the  producer,  who  generates  the  code,  and  the 
consumer,  who  wishes  to  run  it  on  his  or  her  computer.  The  consumer  in  general  does  not  trust 
the  producer,  but  does  trust  his  or  her  own  computer,  including  its  operating  system,  the  run¬ 
time  environment  in  which  the  untrusted  code  will  execute,  and  any  part  of  the  certification  and 
verification  machinery  over  which  he  or  she  has  control. 

The  first  step  is  to  specify  the  safety  policy.  As  with  other  approaches  to  foundational  certified 
code,  the  safety  policy  takes  the  form  of  an  operational  semantics  for  the  target  machine,  that  is, 
a  description  of  the  possible  states  of  the  machine  and  a  transition  relation  between  those  states. 
Any  program  whose  behavior  can  be  described  by  this  semantics  is  considered  safe;  thus  it  must 
differ  from  the  "real"  semantics  of  the  hardware  in  that  it  must  not  contain  any  transitions  cor¬ 
responding  to  behaviors  the  consumer  wishes  to  rule  out.  In  addition  to  this  abstract  machine 
model,  the  safety  policy  provides  a  relation  between  programs  —  represented  at  this  foundational 
level  as  strings  of  bytes  —  and  machine  states  that  relates  any  given  program  to  all  of  the  possible 
initial  states  from  which  execution  of  that  program  might  begin. 

Figure  2.1  is  an  outline  of  the  body  of  Twelf  code  needed  for  metalogical  certification.  It  con¬ 
tains  the  names  and  kinds  of  all  the  important  "top-level"  types,  and  indicates  which  principal, 
the  producer  or  the  consumer,  is  responsible  for  filling  in  the  definition  of  each.  The  three  compo¬ 
nents  of  the  safety  policy  just  described  are  covered  by  the  first  few  lines  of  the  outline.  Machine 
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%  The  safety  policy 
state  :  type. 

%  consumer  provides  definition 

transition  :  state  ->  state  ->  type. 

%  consumer  provides  definition 
initial_state  :  astring  ->  state  ->  type. 

%  consumer  provides  definition 

%  Certificates  and  Validity 
certificate  :  type. 

%  producer  provides  definition 
check  :  certificate  ->  astring  ->  type. 

%mode  check  +CERT  +AS . 

%  producer  provides  definition 

%  The  safety  theorem 

reaches  :  astring  ->  state  ->  type. 
reaches_z  :  reaches  AS  S  <-  initial_state  AS  S. 
reaches.s  :  reaches  AS  S 

<-  reaches  AS  S' 

<-  transition  S'  S. 

safety  :  check  CERT  AS  ->  reaches  AS  S  ->  transition  S  S'  ->  type. 
%mode  safety  +DC  +DR  -DT . 

%  producer  provides  definition 

%worlds  ()  (safety  _  _  _)  . 

%total  (safety  _  _  _)  .  %  consumer  checks  totality 

Figure  2.1:  The  skeleton  of  metalogical  certified  code. 


states  are  represented  by  objects  of  type  state,  and  the  relation  transition  defines  the  dynamic 
semantics  of  the  machine.  Strings  of  bytes  have  type  astring;  the  relation  initial_state  con¬ 
nects  programs,  represented  as  strings  of  bytes,  to  machine  states.  It  is  the  prerogative  of  the  code 
consumer,  who  will  be  running  the  certified  binaries  on  his  or  her  machine,  to  define  the  safety 
policy,  so  it  is  the  consumer  who  is  responsible  for  providing  the  definitions  of  these  LF  types. 

The  rest  of  the  code  to  fill  in  the  outline  is  to  be  provided  by  the  code  producer.  The  producer's 
first  obligation  is  to  define  what  the  certificates  in  certified  binaries  will  look  like.  This  amounts 
to  filling  in  the  definition  of  the  type  certificate  in  the  figure.  This  is  where  the  flexibility  of 
the  metalogical  approach  begins  to  show:  the  framework  does  not  require  any  particular  form 
of  certificate,  so  the  code  producer  is  free  to  define  certificates  to  be  anything  from  terse  typing 
annotations  (as  in  TALx86)  to  oracle  strings  (as  in  PCC  [53])  to  proof  terms  in  higher-order  logic 
(as  in  FPCC  [3])  or  the  Calculus  of  Inductive  Constructions  (like  Hamid  et  al.  [28])  —  provided,  of 
course,  that  he  or  she  can  fulfill  the  remaining  obligations  using  certificates  of  the  form  chosen. 

Since  the  code  producer  gets  to  define  the  type  certificate,  he  or  she  must  also  define 
what  it  means  for  a  certified  binary  to  be  valid.  The  producer  accomplishes  this  by  filling  in 
the  definition  of  the  relation  check  as  shown  in  the  figure.  The  certified  binary  consisting  of 
program  code  AS  :  astring  and  certificate  term  CERT  :  cert  is  considered  valid  if  the  type 
check  CERT  AS  is  inhabited. 
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The  largest  and  most  important  contribution  of  the  producer,  of  course,  is  the  proof  of  the 
safety  theorem.  The  safety  theorem  is  stated  in  terms  of  the  consumer's  safety  policy  and  the 
producer's  method  of  certification,  but  modulo  these  definitions  it  is  always  the  same.  It  says: 


If  the  program  A  S'  is  well  certified,  then  any  state  that  can  be  encountered  while  exe¬ 
cuting  AS  has  a  successor  in  the  transition  relation. 


To  state  this  as  a  Twelf  metatheorem,  we  first  define  the  relation  reaches  between  programs 
(again,  represented  as  byte  strings)  and  machine  states  that  identifies  those  states  that  may  be 
encountered  while  executing  a  given  program.  In  particular,  the  type  reaches  AS  S  is  inhabited 
iff  S  is  reachable  via  the  transition  relation  from  some  initial  state  of  A  S'.  The  kind,  mode 
and  totality  of  the  type  safety  together  comprise  the  theorem:  Given  (any  program  AS,  any 
certificate  CERT,  any  state  S  and)  evidence  that  CERT  is  a  valid  certificate  for  AS  and  that  S  is 
reachable  from  an  initial  state  of  AS,  a  state  S'  can  be  found  such  that  there  is  a  transition  from  S 
to  S' .  The  code  producer  fills  in  a  definition  of  safety  that  is  well-moded  and  total,  proving  the 
theorem. 

To  see  why  this  formulation  of  the  safety  theorem  makes  sense,  note  that  it  essentially  means 
that  the  execution  of  a  well-certified  program  will  never  get  stuck  with  respect  to  the  operational 
semantics  defined  in  the  safety  policy.  Presumably,  the  design  of  the  concrete  hardware  that  will 
run  the  program  defines  a  successor  for  every  possible  state  of  that  concrete  machine,  even  those 
that  the  consumer  considers  unsafe  and  wishes  to  avoid.  The  operational  semantics  in  the  safety 
policy,  on  the  other  hand,  is  constructed  on  purpose  to  lack  these  states  (or  the  transitions  that 
lead  to  them).  Assuming  that  the  safety  policy's  transition  relation  mirrors  the  behavior  of  the 
hardware  closely  enough,1  the  import  of  the  safety  theorem  is  that  when  a  well  certified  program  is 
executed  on  a  concrete  machine,  the  machine  will  never  visit  any  state  (or  perform  any  transition) 
not  covered  by  the  safety  policy.  In  other  words,  well  certified  programs  stay  within  the  realm  of 
allowable  behavior. 

The  alert  reader  will  also  notice  that  this  formulation  of  safety  seems  to  imply  that  a  well- 
certified  program  never  terminates.  This  is  of  little  importance.  If  the  safety  policy  were  required 
to  define  some  notion  of  "terminal"  state,  then  the  theorem  could  be  modified  to  say  that  any 
reachable  state  either  has  a  successor  or  is  terminal.  Crary  and  Sarkar  apparently  considered 
this  nonuniformity  irksome,  so  instead  the  safety  policy  endows  the  machine  with  an  imaginary 
"halted"  state  whose  successor  is  itself.  Instructions  that  terminate  the  certified  program  and  re¬ 
turn  control  to  the  runtime  system  cause  transitions  into  this  state.  In  fact,  this  decision  is  quite 
sensible  if  we  consider  the  mapping  between  states  of  the  concrete  machine  and  states  of  the 
safety  policy  (which  of  course  is  not  formally  defined,  because  the  states  of  the  concrete  machine 
are  not  formally  defined)  to  identify  concrete  states  that  are  indistinguishable  by  safe  programs. 
The  execution  of  the  concrete  machine  can  continue  indefinitely,  long  after  any  given  program  has 
terminated  —  but  of  course  no  program  can  observe  anything  that  happens  after  it  has  finished,  so 
it  is  fitting  that  a  "halting"  execution  of  the  abstract  machine  ends  with  an  unbounded  sequence 
of  transitions  between  indistinguishable  states. 


1(and  that  issues  of  nondeterminism  and  underspecification  are  handled  properly 
discuss  it) 


this  gets  tricky  and  I  won't 
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2.1.3  Certified  Binaries  and  Verification 
Generic  and  Specific  Obligations 

As  I  have  just  explained,  the  producer  must  make  several  contributions  to  the  process  of  metalogi- 
cal  certification:  he  or  she  must  define  the  type  of  certificates,  define  what  it  means  for  a  certificate 
to  be  valid,  prove  that  programs  that  have  valid  certificates  are  safe,  and  provide  a  (valid)  cer¬ 
tificate  for  each  program  he  or  she  asks  a  consumer  to  run.  The  first  two  of  these  constitute  the 
definition  of  a  safety  condition ;  the  producer  claims  that  any  program  satisfying  a  certain  property 
(namely,  the  existence  of  a  valid  certificate)  is  safe  and  promises  to  demonstrate  that  any  program 
he  or  she  submits  for  execution  will  have  that  property  (by  providing  such  a  certificate).  The  next 
contribution,  the  proof  that  well  certified  programs  are  safe,  is  known  as  the  producer's  generic 
obligation  because  it  addresses  the  safety,  not  of  any  particular  program,  but  of  all  programs  that 
satisfy  the  safety  condition.  The  presentation  of  a  certificate  along  with  a  program  to  be  run  is 
called  the  producer 's  specific  obligation,  since  it  establishes  the  particular  program  in  question  as  a 
member  of  that  generic  class. 


Certified  Binaries  and  Verification 

The  file  format  for  certified  binaries  in  the  Crary-Sarkar 
system  is  called  TBF,  for  Trustless  Binary  Format.  Figure  2.2 
shows  the  layout  of  a  TBF  file.  A  certified  binary  contains  a 
magic  number  that  identifies  the  version  of  the  format,  the 
program  code,  a  second  magic  number  that  identifies  the 
safety  condition  the  certificate  is  supposed  to  satisfy,  and  the 
certificate.  The  code  is  IA-32  machine  code  in  raw  binary 
form.  The  certificate  consists  of  definitions  in  Twelf  syntax; 
it  may  contain  any  number  of  definitions,  but  the  last  one  in 
the  file  must  define  a  term  thecert  of  type  certificate. 

Upon  receiving  a  TBF  file  for  execution,  the  consumer 
consults  the  magic  numbers  in  the  file  to  determine  the  safety 
condition  the  program  is  supposed  to  satisfy.  If  the  consumer 
is  familiar  with  the  particular  safety  condition  claimed  and 
believes  it  to  be  sound,  only  the  program-specific  proof  obli¬ 
gation,  the  certificate  itself,  needs  to  be  verified.  If  the  safety 
condition  is  unknown  to  the  consumer,  however,  the  producer  must  first  fulfill  the  generic  proof 
obligation  by  supplying  the  complete  definition  of  the  safety  condition  and  the  generic  safety 
proof. 

Since  safety  proofs  are  large  and  are  shared  between  programs,  they  are  not  included  in  TBF 
files.  When  the  consumer  encounters  an  unfamilar  safety  policy,  it  contacts  the  producer  and  asks 
for  the  Twelf  code  necessary  to  fill  in  the  ellipses  in  the  skeleton.  Upon  receiving  it,  the  consumer 
feeds  the  entire  "fleshed-out"  skeleton  into  Twelf  and  asks  the  program  to  check  the  totality  of  the 
safety  relation  —  remember  that  if  safety  is  a  total  relation  with  the  specified  mode,  then  it 
constitutes  a  proof  of  the  soundness  of  the  safety  condition.  If  Twelf  fails  to  verify  that  safety 
is  total,  the  consumer  has  no  basis  for  believing  the  untrusted  program  is  safe,  and  rejects  it.  If 
Twelf  succeeds  in  verifying  the  totality  of  safety,  then  the  consumer  can  not  only  go  on  to  check 
the  certificate  of  the  particular  program  in  question,  but  can  also  remember  this  generic  result 
and  skip  the  step  of  safety  proof  verification  when  it  encounters  programs  certified  to  this  safety 


cert_end : 


Figure  2.2:  TBF  File  Layout 


16 


CHAPTER  2.  TALT  BACKGROUND 


condition  in  the  future. 

Once  the  soundness  of  the  safety  condition  is  established,  the  consumer  moves  on  to  check¬ 
ing  that  the  particular  program  under  consideration  is  well  certified.  The  basic  idea  is  simply  to 
translate  the  sequence  of  bytes  in  the  code  section  of  the  TBF  file  into  an  LF  term  theprog  of  type 
a st ring  and  check  that  the  type  (check  thecert  theprog)  is  inhabited,  but  as  of  this  writing 
there  are  two  different  mechanisms  under  development  for  accomplishing  this.  The  first  relies  on 
the  logic  programming  features  of  Twelf  to  check  the  certificate;  the  second  requires  the  producer 
to  supply  a  checker  written  in  a  different  language. 

The  first  approach,  which  has  been  under  development  longer  and  is  closer  to  completion, 
is  to  assume  that  the  safety  condition  check  is  defined  in  such  a  way  that  it  can  be  interpreted 
as  an  Elf  logic  program.  Under  this  approach,  the  consumer  simply  passes  the  query  "check 
thecert  theprog"  to  the  Twelf  interpreter.  If  the  query  succeeds,  then  the  type  named  by  the 
query  is  inhabited  and  the  program  is  well  certified;  if  the  query  fails  (or  fails  to  terminate  after 
a  reasonable  period  of  time),  then  the  program  is  rejected.  The  second  approach  is  the  subject 
of  Susmit  Sarkar's  forthcoming  Ph.D.  thesis.  Briefly,  it  requires  the  producer  to  submit,  along 
with  the  definition  of  the  safety  condition  and  the  proof  of  its  soundness,  a  checker  for  certified 
programs.  These  checkers  are  to  be  written  in  a  functional  language  with  a  highly  specialized 
type  system  capable  of  guaranteeing  that  any  well-typed  checker  is  correct  with  respect  to  the  LF 
definition  of  the  safety  condition. 

Regardless  of  which  approach  is  used  to  verify  the  correctness  of  the  certificate,  once  this  step 
is  complete  the  consumer  should  believe  that  the  program  obeys  the  safety  policy  and  be  willing 
to  run  it.  Running  a  program  simply  consists  of  loading  the  bytes  from  the  code  section  of  the  TBF 
file  into  an  executable  area  of  memory  and  jumping  to  the  first  instruction  in  a  manner  consistent 
with  the  safety  policy. 

2.2  TALT 

The  most  effective  known  way  to  fulfill  a  code  producer's  proof  obligations  in  a  certified  code 
setting  is  to  use  a  type  system.  Indeed,  in  early  non-foundational  certified  code  architectures 
(PCC  and  TAL),  the  safety  policy  itself  was  expressed  as  a  type  system.  In  all  foundational  proof¬ 
carrying  code  implementations  that  I  am  aware  of,  proofs  of  safety  are  constructed  using  a  type 
system:  a  program  is  shown  to  be  well-typed,  and  this  fact  is  shown  to  imply  its  adherence  to  the 
safety  policy.  Metalogical  certification  is  no  exception  to  this  pattern,  as  the  only  safety  conditions 
and  safety  proofs  that  have  been  created  for  use  within  the  framework  so  far  have  been  based  on 
type  systems.  The  details  of  the  implementation  have  evolved  slightly  since  it  was  first  conceived 
and  presented  by  Crary  [14],  but  there  has  always  been  a  type  system  at  the  core  of  the  safety 
argument.  That  type  system  is  known  as  TALT,  and  it  was  the  starting  point  for  the  type  system 
design  I  will  discuss  in  detail  in  upcoming  chapters.  In  this  section,  I  describe  the  most  important 
features  of  Talt. 

2.2.1  TALT,  XTALT  and  EXTALT 

In  an  ideal  instantiation  of  the  Crary-Sarkar  framework,  the  safety  argument  might  be  built  around 
a  "type  system  for  machine  language,"  that  is,  a  system  of  formal  judgments  that  apply  directly 
to  sequences  of  bytes  and  characterize  their  structure  as  data  and  their  behavior  as  code.  A  certifi¬ 
cate  for  a  particular  string  of  bytes  would  simply  be  a  typing  derivation,  possibly  compressed  so 
as  to  remove  information  that  could  be  reconstructed  by  examining  the  byte  string  and  certificate 
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(a)  (b) 


(c)  (d) 


Figure  2.3:  Overview  of  the  TALT  safety  structure. 


together.  Validity  of  a  certificate  would  be  checked  by  attempting  this  reconstruction  and,  if  suc¬ 
cessful,  checking  that  the  resulting  derivation  proves  the  appropriate  judgment.  The  safety  proof 
would  be  based  on  progress  and  type  preservation  lemmas  for  the  type  system  with  respect  to  the 
safety  policy's  operational  semantics. 

The  notion  of  a  "type  system  for  machine  language"  is  so  vague  that  it  could  probably  be 
argued  that  this  is  how  TALT  works.  However,  such  an  argument  would  stretch  the  limits  of  what 
most  practitioners  in  the  field  mean  by  most  of  the  terms  involved.  The  intuitions  behind  the  TALT 
system  are  therefore  best  understood  by  casting  it  in  a  somewhat  different  light,  which  reveals 
at  least  three  closely  related  but  distinct  type  systems  and  two  different  operational  semantics. 
As  I  have  already  explained,  the  safety  policy  defines  an  abstract  machine  and  its  operational 
semantics;  the  "TALT  abstract  machine"  is  defined  in  the  course  of  the  safety  proof  and  is  the  focus 
of  much  of  the  type  safety  argument.  The  Talt  type  system  itself  is  a  type  assignment  system  for 
the  TALT  abstract  machine;  XTALT  is  an  explicitly  typed  version  of  TALT  used  in  certificates;  and 
EXTALT  is  the  more  user-friendly  input  language  of  the  certifying  assembler. 

The  relationships  between  these  components  are  sketched  in  Figure  2.3.  The  TALT  abstract 
machine  is  the  core  of  the  system.  This  abstract  machine  is  intended  to  closely  resemble  the  one 
whose  operational  semantics  constitute  the  safety  policy  (which  in  turn  is  an  abstracted  view  of 
the  IA-32)  but  to  be  abstract  enough  to  insulate  the  type  safety  proof  from  machine  details  such 
as  instruction  encoding  and  the  precise  layout  of  a  program's  address  space.  Differences  between 
the  Talt  abstract  machine  and  the  safety  policy  include: 

•  The  Talt  abstract  machine  distinguishes  between  instructions  and  the  bytes  that  encode 
them,  whereas  in  the  safety  policy  the  only  values  are  bytes  or  sequences  of  bytes.  In  ad- 
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dition,  the  instruction  set  of  the  abstract  machine  has  some  instructions  that  do  not  exist  in 
the  safety  policy  or  in  the  concrete  IA-32  instruction  set.  Each  of  these  "fake  instructions"  is 
encoded  by  a  sequence  of  zero  or  more  IA-32  instructions. 

•  The  TALT  abstract  machine  allows  arbitrary  combinations,  and  arbitrary-depth  nestings,  of 
operand  and  destination  addressing  modes.  This  is  for  the  sake  of  uniformity;  combina¬ 
tions  that  do  not  correspond  to  real  IA-32  instructions  will  simply  never  come  up  in  actual 
programs. 

•  The  TALT  abstract  machine  treats  the  stack  as  a  special  object,  distinct  from  the  heap,  whereas 
the  safety  policy  treats  all  of  memory  uniformly  and  treats  the  stack  pointer  register  just  like 
the  other  general-purpose  registers. 

As  the  figure  shows  (column  (d)),  the  TALT  type  system  is  sound  with  respect  to  the  TALT  abstract 
machine,  which  in  turn  is  related  to  the  abstract  machine  of  the  safety  policy  by  a  simulation  theo¬ 
rem.  These  two  facts  together  imply  that  if  the  concrete  machine  starts  in  a  state  that  "implements" 
a  well-typed  abstract  machine  state,  the  subsequent  evaluation  will  not  get  stuck. 

One  important  characteristic  the  safety  policy  and  the  TALT  abstract  machine  have  in  common 
is  that  they  are  basically  untyped.  Values,  instruction  sequences,  and  states  of  the  TALT  abstract 
machine  are  all  completely  free  of  typing  annotations.2  The  type  system  properly  called  TALT  is  a 
type  assignment  system,  or  a  Curry-style  type  system,  for  the  TALT  abstract  machine. 

TALT  is  a  fairly  powerful  type  system,  including  (among  other  things)  System  F-style  polymor¬ 
phism;  thus  it  is  presumed  that  since  typing  is  undecidable  for  Curry-style  F  [70]  it  is  undecidable 
for  TALT  as  well.  As  a  result,  a  certificate  for  the  binary  representation  of  a  TALT  program  must 
provide  not  only  enough  information  to  parse  the  machine  code  as  a  sequence  of  TALT  instruc¬ 
tions  and  values,  but  also  enough  typing  information  to  reconstruct  a  typing  derivation  for  the 
implicitly-typed  TALT  code.  In  the  current  implementation,  these  requirements  are  met  by  using 
an  explicitly-typed  program  as  the  certificate.  The  language  of  explicitly-typed  TALT  programs 
is  called  XTALT;  the  types  of  XTALT  are  the  same  as  those  of  TALT,  but  the  term  language  is  very 
different  and  so  are  the  typing  rules. 

An  XTALT  program  is  a  sequence  of  blocks,  each  with  a  label  and  an  explicit  type  annotation. 
(As  in  conventional  assembly  language,  labels  are  intended  to  denote  the  memory  addresses  at 
which  their  associated  blocks  reside.)  A  theory  of  coercions  takes  the  place  of  Talt's  rich  sub¬ 
typing  relation,  and  the  syntax  of  instructions  and  values  is  constructed  so  as  to  make  syntax- 
directed  type-checking  of  XTALT  programs  feasible.  No  operational  semantics  is  directly  defined 
for  XTALT;  instead,  a  relation  called  elaboration  is  defined  between  XTALT  programs  and  TALT  val¬ 
ues;  if  a  program  X  elaborates  to  V,  then  V  is  the  TALT  representation  of  the  sequence  of  bytes 
encoding  X.  Thus  Xtalt  is  built  for  the  convenience  of  the  type-checker,  while  Talt  is  built  for 
the  convenience  of  the  safety  proof. 

The  assembler  that  produces  certified  binaries  must  therefore  output  an  LF  representation  of 
the  XTALT  representation  of  the  machine  code  it  generates.  The  input  to  the  assembler  is  in  a 
third  language,  which  for  the  purposes  of  this  thesis  I  will  call  EXTALT  (because  it  is  the  external 
language  of  the  assembler).  At  present  EXTALT  does  not  differ  significantly  from  XTALT.  As  I 
will  describe  in  the  next  chapter,  however,  my  resource-bounded  versions  EXTALT-R  and  XTALT-R 
do  differ  from  each  other  in  an  important  way,  which  will  have  an  important  implication  for  the 
assembler  that  must  translate  between  them. 

2Crary  [11]  has  stated  that  this  was  simply  more  convenient  when  proving  the  safety  theorem,  but  I  think  it  can  be 
justified  on  the  aesthetic  grounds  that  it  avoids  putting  a  lot  more  distance  between  the  abstract  and  concrete  opera¬ 
tional  semantics. 
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Figure  2.3  also  shows  how  certificate  verification  in  Talt  works.  When  a  TBF  file  is  received 
for  execution  (column  (a)),  its  two  components  are  extracted.  The  code  is  a  sequence  of  bytes 
(an  LF  term  of  type  astring),  and  the  certificate  is  an  XTALT  program  (also  represented  as  an 
LF  term).  The  safety  condition  consists  of  the  relationships  depicted  in  column  (b):  the  XTALT 
program  must  be  well-typed,  and  the  byte  string  must  in  fact  be  an  encoding  of  that  program. 
Verification  of  a  certified  binary  amounts  to  checking  that  these  two  relationships  hold.  That  is 
all  that  need  be  checked  for  any  particular  program:  the  generic  safety  proof  for  Talt  argues  that 
if  the  safety  condition  is  met,  then  the  relationships  indicated  by  dashed  lines  in  column  (c)  also 
hold.  The  XTALT  program  determines  an  initial  state  of  the  abstract  machine,  and  the  byte  string 
determines  an  initial  state  of  the  safety  policy's  machine.  The  safety  condition  implies  that  the 
abstract  state  is  well  typed  and  the  concrete  state  implements  the  abstract  state;  thus,  by  the  type 
safety  and  simulation  theorems,  execution  of  the  program  will  not  get  stuck. 


2.3  MiniTALT 


Unfortunately,  none  of  the  variations  of  Talt  just  described  will  do  for  the  purposes  of  presen¬ 
tation  in  this  thesis.  The  typing  annotations  of  EXTALT  and  XTALT  are  bulky  and  cumbersome, 
although  these  languages  do  have  the  advantage  of  being  geared  toward  the  presentation  and 
static  analysis  of  programs  rather  than  proofs  of  safety.  Talt  itself,  being  a  type  assignment  sys¬ 
tem,  is  concise,  but  really  applies  to  machine  states  and  values  rather  than  a  convenient  notion  of 
"program".  Furthermore,  and  most  annoyingly  for  human  readability,  the  Talt  abstract  machine 
deals  very  explicitly  with  the  operands  of  jump  instructions,  which  (when  not  indirect)  are  almost 
always  pc-relative;  it  also  is  explicit  about  the  sizes  of  instruction  encodings,  which  on  the  IA-32 
vary  from  one  instruction  to  another.  Thus  finding  the  target  of  a  direct  jump  instruction  requires 
knowing  the  sizes  of  all  the  instructions  in  between  the  jump  and  the  target  —  not  something  that 
readers  of  a  thesis  should  be  asked  to  do  in  order  to  understand  simple  examples. 


Implicitly  Typed 

Explicitly  Typed 

Blocks 

MiniTALT 

Xtalt 

Flat 

Talt 

none 

Table  2.1:  Variants  of  Talt 


Therefore,  for  this  thesis,  I  adopt  a  compro¬ 
mise.  For  the  purposes  of  all  the  code  examples, 
translations  and  proofs  herein  I  will  use  a  block- 
structured,  implicitly  typed  language  called  Mini¬ 
TALT.  Like  Talt,  MiniTALT  is  implicitly  typed, 
so  examples  are  unburdened  by  typing  annota¬ 
tions  (except  as  comments  where  they  are  helpful).  Like  XTALT,  MiniTALT  treats  a  program  as 
an  entity  unto  itself  rather  than  as  a  value  in  the  memory  of  a  machine.  A  program  is  a  sequence 
of  blocks,  each  with  a  label;  a  label  can  occur  as  an  operand,  where  it  is  intended  to  denote  a  pc- 
relative  reference  to  the  location  where  its  associated  block  resides.  Table  2.1  summarizes  these 
relationships. 


In  addition  to  its  more  readable  syntax,  the  MiniTALT  type  system  I  will  use  in  this  thesis  is  a 
considerably  simpler  theory  than  the  Talt  actually  implemented  as  an  instance  of  the  metalogical 
certification  framework.  Specifically,  I  have  removed  many  types  that  do  not  appear  in  any  of 
the  code  examples  in  the  thesis,  and  many  inference  rules  that  do  not  play  a  role  in  any  of  the 
typings  we  will  encounter.  For  the  most  part,  these  omissions  merely  serve  to  exclude  distracting 
information  much  of  which  has  already  been  covered  by  Crary  and  Sarkar  [14, 15]. 
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W  =  4  (word  size  in  bytes) 

B  G  Wordval  =  {0, . . . ,  28W  -  1} 

r  G  Reg  =  {eax,  ebx,  ecx,  edx,  esi,  edi,  ebp} 
f  G  Genreg  =  Reg  U  {esp} 


Figure  2.4:  Machine-Specific  Notation  for  IA-32 


Operands  o 

Destinations  d 

Conditions  k 

Instruction  Sequences  I 


Programs 


P 


B  1 1  |  r  |  il[o  +  j]  |  i‘[oi  +  j  +  j'  ■  o2] 
f  I  i‘[o  +  j }  |  i‘[0!  +j  +  j'  ■  02 } 
e | ne  | b  |  be  | a  |  ae  |  o  |  no 
e 

add  d,  oi,o2  I 

addsptr  d,o,n  I 

call  o  i 

cmp  oi ,  02  I 

cmp  jcc  Oi,  02,  k,  03  I 

halt 

jcc  K,  O  I 

jmp  o  I 

malloc  d,  n  I 

mallocarr  d,n,o  I 

mov  d,  o  I 

pop  n,  d  I 

push  o  I 

ret  I 

salloc  n  I 
sf  ree  n  I 
sub  d,  oi,o2  I 
f  1  fl  ,  *  ■  •  ,  fn  —  In 


Figure  2.5:  MiniTALT  Program  Syntax 
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2.3.1  Basic  Syntax 

The  syntax  of  MiniTALT  programs  is  given  in  Figures  2.4  and  2.5.  Figure  2.4  defines  some  basic  no¬ 
tation  that  is  specific  to  the  IA-32  architecture:  the  word  size  W  =  4  and  the  names  of  the  registers. 
Figure  2.5  has  the  syntax  for  operands,  destinations,  instruction  sequences  and  programs. 

Most  of  the  MiniTALT  syntax  should  be  recognizable  to  those  familiar  with  Intel-style  assem¬ 
bler  syntax  [38].  Unlike  informal  accounts  of  conventional  assembler  syntax,  MiniTALT  distin¬ 
guishes  between  operands,  which  produce  values,  and  destinations,  where  values  can  be  stored. 
Also,  unlike  the  concrete  IA-32  instruction  set,  MiniTALT  does  not  restrict  the  combinations  of 
operands  and  destinations  that  can  appear  together  in  an  instruction. 

An  operand  is  either  a  literal  word,  a  label,  a  register,  or  a  memory  operand.  A  destination 
may  be  any  of  these  except  for  a  literal  or  label.3  Memory  operands  are  annotated  with  the  size 
(in  bytes)  of  the  value  being  fetched  from  memory;  thus  l‘[eax  +  2]  means  a  single  byte,  located 
at  offset  2  from  the  pointer  in  eax.  I  will  elide  the  size  prefix  for  word-sized  operands,  and  I  will 
elide  the  offset  (the  j  in  [o  +  j ])  when  it  is  zero.  Thus  the  4-byte  word  pointed  to  by  ebp  can  be 
written  simply  [ebp].  For  array  element  operands  i‘[oi  +  j  +  j'  ■  02],  the  scaling  factor  j'  may  be 
elided  if  it  is  1.  Similar  conventions  apply  to  destinations. 

Many  of  the  instructions  of  MiniTALT  ( e.g .  add,  jmp)  are  familiar  IA-32  instructions.  The 
call  instruction  is  slightly  unusual  in  that  each  call  specifies  the  label  that  should  be  used  as 
the  return  address  when  it  is  executed.  (This  is  for  the  convenience  of  the  abstract  operational 
semantics,  so  that  every  code  pointer  ever  manipulated  by  the  machine  corresponds  to  a  label  that 
occurs  in  the  program  text.)  The  instruction  call  o  i  will  therefore  usually  mean  "Call  the  func¬ 
tion  o,  and  when  it  returns,  continue  with  the  code  at  t.”  This  is  more  general  than  the  call  of 
concrete  IA-32  implementations,  in  which  the  return  address  is  always  the  address  of  the  instruc¬ 
tion  immediately  following  the  call.  For  readability,  code  examples  will  be  written  in  a  style  that 
appeals  to  this  intuition:  e.g.,  the  two-block  program  fragment  fj  =  call  o  £2,  £2  =  jmp  i\  will  be 
written  with  a  single  block,  as  t\  =  call  o  j mp  i  1 . 

Most  of  the  other  instructions  of  MiniTALT  are  implemented  straightforwardly  with  IA-32 
instructions  or  short  sequences  of  instructions.  The  addsptr  instruction,  for  example,  adds  a 
constant  value  to  a  pointer  into  the  stack;  this  is  implemented  by  an  add  instruction  on  an  actual 
hardware  (that  is,  the  same  bytes  that  encode  addsptr  also  encode  add),  but  since  the  abstract 
machine  treats  pointers  differently  from  integers,  and  the  stack  as  distinct  from  the  heap,  a  differ¬ 
ent  syntax  and  typing  rule  are  needed  at  the  type  level.  Similarly,  sfree  is  "really"  just  an  add 
instruction. 

The  remaining  instructions  of  MiniTALT  are  implemented  by  sequences  of  two  or  more  IA-32 
instructions.  For  instance,  a  cmp  jcc  is  implemented  by  a  cmp  followed  by  a  jcc;  the  combined 
instruction  at  the  Talt  level  has  a  special  typing  rule  allowing  typings  that  could  not  easily  be 
achieved  with  two  separate  instructions.4  Less  obviously,  salloc,  which  allocates  space  on  the 
stack,  is  implemented  by  a  sequence  of  two  instructions:  a  sub  to  decrement  the  stack  pointer, 
and  a  mov  into  the  newly  allocated  space  that  triggers  a  page  fault  if  the  stack  has  overflowed.  Fi¬ 
nally,  the  malloc  and  mallocarr  instructions  are  implemented  as  call  instructions  that  invoke 
functions  in  the  runtime  library  to  allocate  space  in  the  heap. 

3In  conventional  IA-32  assembly  language  a  label  may  be  used  as  a  destination;  this  is  disallowed  in  Talt  to  keep 
code  position-independent. 

4"Easily"  is  a  weasel  word  here.  Presumably  one  could  come  up  with  special  rules  for  cmp  and  jcc  that  relied  on 
some  ad  hoc  device  to  make  sure  they  were  only  ever  used  when  the  instructions  appeared  together.  I  do  not  even  mean 
to  suggest  that  that  would  be  a  bad  idea,  but  it  is  not  how  Talt  works. 
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Kinds  I< 

Static  Terms  c,  r,  x 


Static  Contexts  A 

Register  File  Types  T 

Memory  Types  'P 


T  |  Ti  |  TD  |  N  |  h  k2 
a 

ns i  |  BO  |  Bi  |  n  X  t2  |  r  f  x  |  box(r)  |  mbox(r)  |  sptr(r) 
T  — ►  0  |  set=(x)  |  set<(x)  |  set>(x)  |  Ma:K.r  \  3a:K.r 
T\  A  72  |  Ti  V  t2  |  void  |  pa.r  \  n  \  X a:K.c  \  c\  c2 
■  |  A,  a:K  |  A,  <p  true 
{eax:rax, . . . ,  ebp:rbp,  esp:rsp} 

{4  :  ti,  ...  ,£n  :  rn} 


Figure  2.6:  MiniTALT  Type  System  Syntax 


2.3.2  Type  System 

The  syntax  of  the  MiniTALT  type  system  is  given  in  Figure  2.6,  and  the  judgment  forms  are  sum¬ 
marized  in  Table  2.2.  At  the  top  level  of  the  system  are  four  kinds,  which  classify  the  terms  at  the 
second  level,  which  we  call  static  terms.  The  class  of  static  terms  is  comprised  of  the  types  (of  kinds 
Ti,  TD  and  T)  and  the  number  terms  (of  kind  N).  By  convention,  I  will  use  the  metavariables  r  and 
x  in  place  of  the  general  metavariable  c  to  indicate  that  the  static  term  referred  to  is  a  type  or  a 
number  term,  respectively.  Furthermore,  I  will  use  the  letter  a  instead  of  a  for  variables  intended 
to  be  of  kind  N .  The  only  terms  of  kind  N  other  than  variables  are  the  numerals,  written  n,  where 
n  is  a  nonnegative  integer.  (The  set  of  static  terms  will  be  expanded  significantly  when  I  extend 
MiniTALT  to  MiniTALT-R  in  Chapter  3.)  I  have  chosen  to  call  the  syntactic  category  containing 
the  types  "static  terms"  rather  than  the  more  usual  "type  constructors"  (or  simply  "constructors") 
because  although  number  terms  may  appear  in  types,  they  cannot  really  be  said  to  construct  any¬ 
thing.  The  name  "static  terms"  also  highlights  my  intention  that  these  terms  are  part  of  the  (static) 
type  assignment  system  only;  they  do  not  appear  in  raw  MiniTALT  programs. 

As  usual,  the  role  of  types  is  to  classify  values.  The  need  for  three  different  kinds  for  types 
comes  from  an  unusual  feature  of  TALT,  namely  that  values  are  not  all  the  same  size  —  in  fact, 
it  is  not  even  the  case  that  values  of  the  same  type  have  the  same  size  (for  example,  consider  the 
type  BO  V  B4).  For  each  natural  number  n  >  0,  T n  is  the  kind  of  types  whose  values  are  exactly  n 
bytes  in  size.  TD  is  the  kind  of  types  r  such  that  all  values  of  type  r  are  the  same  size.  That  is,  if 
r  has  kind  TD  and  v\  and  v2  have  type  r,  then  v\  and  v2  must  be  the  same  size;  types  of  kind  TD 
"determine"  the  size  of  the  values  they  contain.  Thus  any  type  of  kind  Ti  also  has  kind  TD.  T  is 
the  kind  of  any  type  whatsoever,  even  those  types  that  contain  values  of  more  than  one  size.  Most 
of  the  values  manipulated  by  MiniTALT-R  code  will  have  types  of  kind  TW ,  where  W  is  the  word 
size  of  the  architecture  (for  IA-32,  W  =  4).  The  notable  exception  is  the  stack,  which  is  permitted 
to  vary  greatly  in  size  and  hence  usually  has  a  type  of  kind  TD. 

For  i  >  0,  the  "nonsense"  type  ns/  may  be  given  to  any  value  whatsoever  of  size  i;  any  type 
of  kind  Ti  is  therefore  a  subtype  of  ns*.  For  i  >  0,  Bi  is  the  type  of  integer  values  i  bytes  in 
width.  Values  of  the  product  type  ti  x  t2  consist  of  a  value  of  type  r\  and  one  of  type  t2  appended 
together;  hence  if  t\  :  T  i  and  t2  :Tj  then  the  product  has  kind  T(*  +  j).  There  are  subtyping  rules 
that  make  the  product  constructor  associative  and  BO  a  unit.  The  array  type  r  j  x,  where  x  is  a 
number  term,  describes  values  that  consist  of  x  values  of  type  r  appended  together.  Thus  if  r  has 
kind  Ti,  then  r  ]  n  has  kind  T  (?//'). 

Pointers  to  code  (and  in  particular  the  labels  associated  with  MiniTALT  instruction  blocks) 
have  arrow  types  of  the  form  T  — >  0,  where  F  is  a  register  file  type.  It  is  safe  to  jump  to  a  pointer  of 
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Judgment 

Meaning 

Ah  c  :  k 

c  has  kind  k 

ALT 

T  is  well-formed 

A  F  T\  <  72 

7~i  is  a  subtype  of  T2 

A  b  Ti  <  r2 

Ti  is  a  subtype  of  T2 

A;$;Tho:r 
A;T;Thd:  t-+T' 
A;  T;  T  b  / 

Operand  o  has  type  r 

Propagating  a  value  of  type  r  to  d  yields  T' 

I  is  well-typed 

A;$h/:r  block 

I  constitutes  a  block  of  type  r 

F  P 

P  is  well-typed 

Table  2.2:  MiniTALT  Typing  Judgment  Forms 


type  T  — ►  0  if  the  current  register  state  has  type  T.  Pointers  to  data  in  the  heap  have  type  box(r); 
pointers  to  mutable  data  in  the  heap  have  type  mbox(r).  The  type  sptr(r)  describes  a  pointer  into 
the  stack.  Universal  quantification  \/a:K.r  and  existential  quantification  3a:K.r  have  their  usual 
meanings,  as  do  recursive  (pa.r),  intersection  (A)  and  union  types  (V). 

The  type  set=(x),  where  x  is  a  word  term,  is  a  singleton  type  whose  sole  element  is  the  word¬ 
sized  binary  representation  of  the  number  denoted  by  x.  (If  the  number  is  not  representable,  then 
set=(x)  is  an  empty  type.)  The  subrange  types  set<  (x)  and  set>(x)  have  as  their  elements  the 
word-sized  unsigned  representations  of  numbers  less  than  x  and  greater  than  x  respectively.  In 
Talt  the  singleton  and  subrange  types  are  used  mainly  for  array  bounds  checking  and  the  imple¬ 
mentation  of  disjoint  union  types;  in  Talt-R  I  will  have  another  important  use  for  the  singleton 
type. 

2.3.3  Instruction  Typing 

Since  a  MiniTALT  program  consists  of  a  set  of  labeled  instruction  sequences,  the  central  judgment 
in  the  type  system  is  the  one  pertaining  to  instruction  sequences.  The  judgment 

A;f;TI-/ 

means  that  in  the  context  consisting  of  the  kinding  assumptions  A,  the  memory  type  T  and  the 
register  file  type  T,  the  instruction  sequence  I  is  well- typed.  In  effect,  it  states  that  the  sequence  I 
is  safe  to  execute  when  the  heap  is  of  the  form  described  by  T  and  the  registers  contain  values  of 
the  types  described  by  T. 

Some  of  the  rules  defining  this  judgment  are  shown  in  Figure  2.7;  these  rules  make  heavy  use 
of  auxiliary  typing  judgments  for  operands  and  destinations,  whose  meanings  are  summarized 
in  Table  2.2.  One  of  the  simplest  instruction  typing  rules  in  the  system  is  the  one  for  the  mov 
instruction,  the  first  rule  in  the  figure.  This  rule  states  that  the  instruction  sequence  consisting  of 
mov  d ,  o  followed  by  I  is  well  typed  if: 

•  the  operand  o  is  well-typed  (with  type  r),  and 

•  the  destination  d  is  well-formed,  and  T'  describes  the  state  of  the  registers  after  propagating 
a  value  of  type  r  to  d,  and 

•  the  continuation  I  is  well- typed  under  T'. 
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A;l;rho:r  A;  T  h  Oi  :  B4  A;  T  h  o2  :  B4 

A;$;rhd:r->r'  A;tf;r  h  /  A;  4-;  T  b  d  :  B4  ->  T'  A;$;rh/  A;  T  b  o  :  T  0 

A;$;rhmov  d,o  A;  4^;  T  h  add  d,  oi,  02  I  A;  Hi;  T  F  jmp  o  I 


(r(esp)  =  ts)  (r(esp)  =  rs)  A;  T  h  t  :  Tr  -*■  0 

A;$;rho:r  A;f;r{esp:r  x  rs}  h  /  A;  4';  T  h  o  :  T{esp  :  (Tr  — >  0)  x  rs}  — >  0 

A;  T;  T  h  push  o  I  A;  4';  T  b  call  o  l 


Figure  2.7:  Selected  Instruction  Typing  Rules. 


Values 

V 

:=  B  |  £  |  sptr(n) 

Heap  values 

V 

:=  I  |  (v!,...,vn) 

Memories 

H 

:=  {ti  1— >  Lj, . . . , £n  1— >  Vnj 

Flags 

b  : 

~  0  |  1 

Flag  sets 

<t>  ■ 

:=  {cf  i— >  bc,  zf  H- >  bz,  sf  bs,  of  1— > 

Register  Files 

R 

:=  {eaxh4j)al,...,ebpi-M)bp ,  flags 

Machine  Configurations 

M  : 

:=  (H,  VS,R,  I) 

Figure  2.8:  MiniTALT  Abstract  Machine  Configurations 


The  add  rule  is  similar  to  the  mov  rule,  except  that  the  two  operands  must  both  have  type  B4  — 
that  is,  produce  32-bit  integer  values  —  and  the  value  propagated  to  the  destination  has  type  B  4 
as  well. 

The  typing  of  control  transfer  instructions  is  illustrated  by  the  typing  rule  for  jmp.  It  states 
that  the  instruction  jmp  o  is  well-typed  if  the  operand  o  has  type  T  — >  0,  where  T  is  the  current 
register  file  type.  In  other  words,  (the  value  of)  o  must  be  a  pointer  to  code  that  is  safe  to  execute 
under  precisely  those  conditions  that  happen  to  hold  at  the  time. 

The  rule  for  push  (in  the  second  row  of  the  figure)  illustrates  the  typing  of  stack  manipulation 
instructions.  If  the  stack  has  type  ts,  and  the  operand  o  has  type  r,  then  the  effect  of  push  o  is  to 
append  the  value  of  o  to  the  existing  stack,  producing  a  stack  of  type  r  x  rs;  the  continuation  I 
must  be  well-typed  assuming  the  stack  has  this  new  type. 

The  LA-32  function  call  instruction  takes  a  single  operand,  which  is  the  address  of  a  function. 
The  call  instruction  pushes  the  specified  return  address  label  onto  the  stack  and  jumps  to  the 
start  of  the  function.  Thus  call  is  essentially  a  combination  of  push  and  jmp.  The  typing  rule 
captures  this:  in  order  for  the  one-instruction  sequence  call  o  l  to  be  well-typed,  the  return  label 
l  must  have  some  code  type  Tr  — >  0,  and  the  code  pointed  to  by  the  value  of  o  must  be  safe  to 
execute  in  the  state  that  results  from  pushing  i  onto  the  stack. 

The  typing  rules  selected  for  discussion  here  cover  most  of  the  main  ideas  at  work  in  MiniTALT 
type  system.  The  complete  set  of  typing  rules,  including  rules  for  kinding,  subtyping  and  auxiliary 
judgments,  can  be  found  in  Appendix  A. 
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2.3.4  Operational  Semantics 

The  dynamic  semantics  of  MiniTALT  is  defined  in  terms  of  the  abstract  machine  whose  configu¬ 
rations  have  the  form  shown  in  Figure  2.8.  Because  the  purpose  of  MiniTALT  is  to  support  clear 
explanation  of  key  concepts  within  the  pages  of  this  thesis,  rather  than  to  play  a  direct  role  in 
the  certification  of  IA-32  programs,  the  semantics  I  present  here  is  in  the  style  of  TAL  [47]  and 
STAL  [46],  rather  than  the  more  realistic  but  more  complicated  "official"  semantics  of  TALT.  A 
configuration  of  the  MiniTALT  machine  consists  of  a  memory  ( H ),  a  stack  (14),  a  register  file  (R), 
and  an  instruction  sequence  (I).  The  memory  maps  locations,  which  are  abstract  pointer  values 
not  confusable  with  integers,  to  heap  values,  each  of  which  is  either  an  instruction  sequence  or 
a  sequence  of  word-sized  values.  The  stack  is  also  a  heap  value.  The  register  file  associates  a 
word-sized  value  with  each  of  the  machine's  general-purpose  register  names  and  a  bit  with  each 
of  the  four  status  flags  commonly  used  for  conditional  jumps;  the  stack  pointer  register  implicitly 
points  to  the  beginning  of  the  stack  value  Vs.  A  pointer  into  the  stack  has  the  form  sptr(n);  this  is 
a  word-sized  value  and  represents  the  location  n  bytes  away  from  the  base  of  the  stack. 

The  (single-step)  transition  relation  i— >  on  machine  configurations  is  defined  in  Table  2.3.  The 
conditions  in  the  second  column  of  the  table  are  stated  in  terms  of  auxiliary  judgments  for  resolv¬ 
ing  operands  and  propagating  values  to  destinations;  the  rules  for  these  judgments  can  be  found 
in  Appendix  A.2.  The  addition  and  subtraction  operations  ©  and  ©  correspond  to  the  32-bit  bi¬ 
nary  arithmetic  performed  by  an  IA-32  processor,  and  specify  both  the  resulting  word  and  the 
new  values  of  the  status  flags;  the  definitions  of  these  operations  are  omitted,  as  are  the  formal 
definitions  of  the  condition  satisfaction  relations  <f>  j=  k.  Informally,  <b  |—  k  if  the  status  flag  values 
(f>  indicate  that  the  condition  k  holds  of  the  most  recent  arithmetic  operation.  For  details  of  how 
these  things  are  determined,  see  either  the  TALT  paper  [14]  or  the  IA-32  manual  [38]. 

2.4  Chapter  Summary 

This  chapter  consitutes  the  background  on  Crary-Sarkar  metalogical  code  certification  necessary 
to  understand  the  balance  of  the  thesis,  including  specifics  of  the  type  system  Talt.  Talt  is  a  type 
system  for  an  abstract  machine  closely  related  to  a  safe  subset  of  the  IA-32  architecture,  and  is  the 
primary  exemplar  of  code  certification  using  the  Twelf  metalogic.  I  have  described  the  mecha¬ 
nisms  of  certification  and  verification  of  Talt  programs,  including  the  roles  of  the  related  systems 
XTALT  and  EXTALT,  and  given  a  more  detailed  presentation  of  the  variant  called  MiniTALT  that 
stands  in  for  Talt  in  the  bulk  of  the  thesis. 


26 


CHAPTER  2.  TALT  BACKGROUND 


;//  =  ... 

and. . . 

(if,  F,  A,  I)  1  ^  (iF,  F/,  R',  /')  zo/zere. . . 

add  d,  o i,  02  / ' 

H,  Vs,  R  h  01  B1 

H,  Vs ,  R  h  02  -B2 

T>1  ©  i?2  =  (-B3,  4>) 

H,VS,RN  d(B:i)  H' ,  V's,  R\ 

=  -Ri{/Za</s  1— ►  0} 

mov  d,  o  I' 

H,  Vs ,  R  b  0  -w  v 

d(u)  ^H',V',R' 

sub  d ,  oi,  02  I' 

H,Vs,R\-o1^B1 

H,  Vs ,  R  h  02  i?2 

BiQ  B2  =  (B3,  (f>) 

H,  VS.RN  d{B3)  -  H',  Vj,  A, 

R'  =  R\{flags  0} 

cmp  oi,  02  I' 

F,  14,  i?  b  01 

H,VSl  R  h  o2  B2 

BiQ  B2  =  (B3,  <f>) 

R'  =  R{flags  t— >  </>} 

jmp  o  I0 

H,Vs,R\~o^£ 

H{t)  =  /' 

H'  =  H,  V'  =  F/  R'  =  R 

call  o  £r 

0^6 
=  V 

H'  =  H,  R'  =  R,  V'  =  £r@Vs 

jccK,o/o 

R(  flags)  =  k 

H.Vs,R\-o^l 

H(t)  =  I' 

cq 

II 

cq 

II 

tq" 

II 

tq 

jcc/c,o/o 

R(flags)  k 

II 

II 

II 

II 

cmp  j cc  oi,02,k,  03  /o 

( H,VS,R ,  cmp  oi,o2  j  cc  re,  o3  J0) 
ea2  (H'X,R',I') 

pop  d  I' 

Vs  =  Vi@vs0 

Hy^RPdivj^H'xy 

push  0  /' 

H,VS,  R  \~  0  v 

H'  =  H,R!  =  R,  V’  =  v@Vs 

ret  /o 

14  =  £@f; 

=  /' 

H’  =  H,R!  =  R 

salloc  n  I' 

n  =  mIF 

h’  =  h,  v'  =  (n,-..,vm)@vs 

R'  =  R,v\, ...  ,vm  are  arbitrary  values 

sf  ree  n  I' 

f  =  f@f' 

|F|  =n 

H1  =  H ,  R'  =  R 

malloc  ci,  n  I' 

n  =  mW 
£  ^  dom (H) 

H1  =  H{£^(v vrn)} 
HuVs,R\-d(£)  H',V',R ' 

Table  2.3:  MiniTALT  Abstract  Machine  Evaluation 


Chapter  3 

TALT-R:  A  Typed  Assembly  Language 
for  Responsiveness 


The  central  claim  of  this  thesis  is  that  static  enforcement  of  timing  policies  using  type  systems  is 
possible.  In  this  chapter  I  begin  to  offer  support  for  that  claim  by  describing  a  TALT-like  type 
system  that  allows  a  range  of  timing  policies  to  be  certified  within  the  metalogical  framework 
described  in  Chapter  2.  Karl  Crary  has  suggested  [11]  that  my  work  be  considered  the  next  version 
of  Talt  and  essentially  replace  it;  however,  until  that  happens  it  will  be  useful  to  distinguish 
Crary's  Talt  from  my  own.  Therefore,  for  the  time  being  I  call  my  assembly  language  Talt-R  (for 
"Responsiveness  ") . 

Like  Talt,  Talt-R  is  actually  a  number  of  different,  but  closely  related,  languages  that  play 
different  roles  in  the  certified  code  process.  The  three  most  prominent  are: 

•  Talt-R  itself,  which  is  a  Curry-style  type  system  in  which  type-checking  is  presumed  un- 
decidable.  Talt-R  is  analogous  to  Crary's  Talt,  the  language  for  which  Crary  and  Sarkar 
directly  proved  a  safety  metatheorem.  I  have  not  undertaken  a  formal  safety  proof  for  Talt- 
R,  but  I  am  confident  in  the  conjecture  that  such  a  proof  would  be  a  mostly  straightforward 
extension  of  the  proof  for  Talt. 

•  XTALT-R  (analogous  to  XTALT),  which  is  an  explicitly-typed  version  of  Talt-R  for  which 
type-checking  is  tractable.  The  certificate  for  a  Talt-R  program  is  an  XTALT-R  program. 

•  EXTALT-R  (analogous  to  EXTALT)  which  is  the  external  language  of  the  Talt-R  assembler 
(and  therefore  also  the  direct  target  of  high-level  language  compilers  using  Talt-R  for  certi¬ 
fication). 

However,  as  I  explained  for  their  Talt  analogues  in  Chapter  2,  none  of  these  three  is  a  particularly 
good  language  to  use  when  formally  describing  a  compiler  as  I  must  do  in  this  thesis.  Therefore, 
this  chapter  introduces  the  core  language  MiniTALT-R,  which  extends  the  MiniTALT  of  Chapter  2 
in  just  the  same  way  that  TALT-R  extends  TALT. 

To  make  the  discussion  of  concrete,  I  start  by  describing  a  specific  timing  policy  that  will  serve 
as  the  main  motivating  example  for  the  design  of  TALT-R  (this  chapter  and  Chapter  4)  and  my 
compiler  implementation  (Chapters  5,  6  and  7).  In  Chapter  8  I  will  finally  leave  this  particular 
example  behind  and  explore  the  range  of  policies  that  can  be  certified  using  Talt-R. 
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3.1  A  Responsiveness  Policy 

As  discussed  in  Section  1.2,  periodic  yielding  is  an  important  timing  requirement  that  must  be  en¬ 
forced  by  operating  systems  and  other  kinds  of  supervisors  in  multithreaded  applications.  A 
process  that  fails  to  return  control  to  the  scheduler  promptly  can  disrupt  the  behavior  of  other 
processes  or  bring  the  entire  system  to  a  halt.  It  seems,  therefore,  that  any  proposal  for  static  en¬ 
forcement  of  timing  policies  as  an  alternative  to  dynamic  enforcement  by  pre-emptive  scheduling 
must  address  the  issue  of  cooperation  among  user  processes. 

Because  of  the  effects  non-conformant  programs  can  have  on  other  processes,  I  choose  to  call 
this  "cooperativeness"  requirement  responsiveness.  I  state  it  semi-formally  as  follows: 

I  assume  there  is  some  system-specific  set  of  operations,  the  yielding  operations,  that  certified 
programs  are  expected  to  perform  with  at  least  a  certain  frequency .  In  particidar,  I  assume  that, 
for  some  large  integer  Y  chosen  in  advance,  a  certified  program  must  never  execute  more  than 
Y  non-yielding  instructions  in  a  row. 

The  specific  set  of  yielding  operations  will  vary  between  systems.  In  the  archetypical  example  of 
an  operating  system,  any  system  call  that  gives  the  kernel  an  opportunity  to  deschedule  the  user 
process  will  count  as  a  yield.  In  other  scenarios,  such  as  user-level  thread  schedulers,  application 
plugin  frameworks,  or  mobile  code  host  environments,  the  interface  presented  to  untrusted  code 
will  be  application-defined  and  so  will  the  designation  of  some  of  the  available  procedures  as 
"yielding." 

For  the  purposes  of  the  exposition  in  this  thesis,  I  will  assume  there  is  exactly  one  yielding 
operation,  which  I  simply  call  "yield".  This  procedure  is  called  from  a  Talt-R  program  by  a  new 
yield  instruction,  implemented  as  a  function  call.  The  type  system  of  Talt-R  will  therefore  be 
designed  to  enforce  the  simple  policy  that  no  more  than  Y  instructions  are  ever  executed  between 
two  successive  yield's. 

3.2  MiniTALT-R 

The  remainder  of  this  chapter  describes  the  core  language  MiniTALT-R,  which  extends  the  Mini- 
TALT  of  Chapter  2.  Because  the  two  languages,  abstract  machines  and  type  systems  are  so  closely 
related,  my  presentation  in  this  chapter  will  cover  only  the  differences  between  them,  i.e.  the 
extensions  and  refinements  that  distinguish  MiniTALT-R  from  MiniTALT.  A  complete  formal  def¬ 
inition  of  MiniTALT-R  is  given  in  Appendix  B. 

Since  the  design  of  this  new  language  is  motivated  by  the  desire  to  certify  programs  with 
respect  to  a  particular  safety  policy,  I  begin  with  the  components  of  the  design  that  implement 
that  policy:  the  extension  of  the  instruction  set  to  include  a  yielding  operation,  and  the  refinement 
of  the  operational  semantics  that  makes  it  unsafe  to  run  too  long  without  yielding.  After  laying 
this  groundwork  I  will  explain  the  extensions  and  refinements  to  the  type  assignment  system  that 
are  needed  to  guarantee  programs  are  safe. 

3.2.1  New  Instructions 

The  MiniTALT-R  programming  language  extends  that  of  MiniTALT  with  just  two  new  instruc¬ 
tions.  Formally,  the  grammar  for  instruction  sequences  gets  two  new  productions: 

Instruction  Sequences  I  ::=  •  •  •  |  sub  jae  rj,  or,  02, 03  I  |  yield  I 
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The  yield  instruction  is  the  key  addition:  it  is  this  instruction  that  must  be  executed  with  at  least 
a  certain  frequency  under  the  new  safety  policy.  The  other  new  instruction  is  sub  jae  ("subtract 
and  jump  if  above  or  equal"),  a  compound  instruction  comprised  of  a  comparison  and  a  condi¬ 
tional  jump.  The  instruction  sequence  sub  jae  r  01,02,03  I  has  the  same  operational  behavior 
as  (sub  rd,  o\,  02;  jcc  ae,  03;  I):  it  subtracts  02  from  o\,  stores  the  result  in  rd,  and  jumps  to  03  if 
01  is  greater  than  or  equal  to  02  (interpreting  these  values  as  unsigned  integers).  A  special  typ¬ 
ing  rule  reflects  the  result  of  the  conditional  jump  into  the  type  system.  In  this  sense,  sub  jae  is 
related  to  the  cmpjcc  instruction  inherited  from  TALT.  The  addition  of  sub  jae  to  the  language 
may  seem  arbitrary  at  this  point,  but  its  usefulness  in  producing  safe-but-efficient  programs  will 
become  clear  later  on. 

3.2.2  The  MiniTALT-R  Abstract  Machine 

As  discussed  earlier  (Section  2.1),  the  operational  semantics  with  respect  to  which  the  type  safety 
theorem  is  proved  comprises  the  safety  policy  in  a  foundational  certification  system.  Thus,  to 
certify  that  programs  yield  at  least  once  every  Y  instructions,  I  must  provide  an  operational  se¬ 
mantics  in  which  any  valid  execution  necessarily  obeys  this  policy  and  a  type  system  that  is  sound 
with  respect  to  that  semantics. 

The  dynamic  semantics  of  MiniTALT-R  are  defined  in  terms  of  the  MiniTALT-R  abstract  ma¬ 
chine,  which  is  a  refinement  of  the  MiniTALT  abstract  machine.  The  key  change  is  the  addition  of 
a  virtual  clock  register  to  the  register  file: 

Register  files  R  ,c  k  =  n}  (n  >  0) 

The  value  of  the  virtual  clock  is  a  nonnegative  integer.  Any  non-yielding  instruction  executed  by 
the  MiniTALT-R  abstract  machine  decrements  the  virtual  clock,  while  the  yielding  instruction  sets 
the  virtual  clock  to  Y. 

These  properties  of  the  virtual  clock  capture  the  essence  of  the  responsiveness  policy.  Since 
the  virtual  clock  can  never  be  negative,  any  machine  state  in  which  the  next  instruction  is  non¬ 
yielding  but  the  virtual  clock  is  zero  is  stuck  (and  thus  forbidden).  Therefore,  any  safe  execution 
starting  from  a  state  where  ck  =  n  must  perform  at  least  one  yield  in  its  first  n  +  1  steps.  Since 
the  yield  instruction  sets  the  clock  to  Y ,  it  follows  that  successive  yields  must  occur  no  more 
than  Y  instructions  apart. 

This  method  of  instruction  counting  is  not  new:  Necula  and  Lee  proposed  the  use  of  a  virtual 
clock  for  proof-carrying  code  [51],  and  Crary  and  Weirich  used  one  in  their  languages  LXres  and 
TALres  [18].  Unlike  these  others,  however,  I  am  not  attempting  to  bound  total  running  time;  I  am 
only  interested  in  bounding  the  time  until  the  next  yield. 


3.3  Static  Semantics 

The  type  system  of  MiniTALT-R,  like  its  instruction  set  and  abstract  machine,  is  arrived  at  by 
means  of  a  few  changes  to  its  MiniTALT  counterpart.  The  modifications  to  the  type  system  are 
shown  in  Figure  3.1. 

The  first  and  most  important  syntactic  change  is  the  inclusion  of  a  clock  term  in  every  register 
file  type.  The  clock  term  assignment  ck  :  t,  where  t  is  a  static  term  of  type  N,  asserts  that  the  value 
of  the  virtual  clock  is  at  least  the  number  denoted  by  t.  The  kind  N  is  extended  to  include  formal 
sums  of  the  form  t\  +  U;  the  members  of  this  expanded  kind  N  will  be  called  constraint  terms. 
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Static  Terms 

c,  t,  T,  X 

::=  •  •  • 

1  <P=>t\  S(t) 

n 

(replaces  B) 

t\+t2 

Constraint  Formulas 

P 

■■■■=  ti<t2  \  h 

=  h 

Static  Contexts 

A 

::=  ■  A,  a:K 

A,  y  true 

Register  File  Types 

T 

::=  {eax:rax, . 

. . ,  ebp:rbp,  esp:r,  ck:t} 

Figure  3.1: 

MiniTALT-R  Type  System  Syntax 

Another  important  addition  is  the  class  of  constraint  formulas,  which  are  assertions  of  equality 
or  non-strict  inequality  between  constraint  terms.1  The  constraint  formulas  and  their  role  in  typing 
will  be  discussed  below  (Section  3.3.1)  and  the  proof  theory  of  the  logic  they  comprise  will  be 
explored  in  depth  in  Chapter  4. 

MiniTALT-R  adds  only  one  form  of  type  to  MiniTALT:  th e  guarded  type  y  =>  r  describes  values 
which  may  be  given  type  r  if  the  formula  y  is  satisfied.  As  a  syntactic  convenience,  I  also  introduce 
the  notation  S(t)  as  a  synonym  for  set =(t)  — by  convention,  I  write  set =(t)  when  using  this  type 
in  the  implementation  of  disjoint  union  types  or  array  bounds  checks  (z.e.,  the  purposes  it  serves 
in  plain  Talt),  and  S(t)  when  it  occurs  in  the  typing  of  time-keeping  idioms. 

3.3.1  The  Constraint  Subsystem 

The  purpose  of  the  constraint  terms  and  formulas  is  to  allow  the  type  system  to  reason  about 
the  time  remaining  before  the  next  yield  instruction  must  be  performed.  This  constraint  logic  is 
largely  separable  from  the  rest  of  the  type  system;  in  fact,  there  is  a  certain  degree  of  flexibility  in  its 
design.  The  version  I  will  describe  in  this  proposal  is  engineered  mostly  for  clarity  of  presentation. 

As  mentioned  above,  the  constraint  terms  include  the  natural  numbers  (written  n,  where  n  > 
0)  and  are  closed  under  addition;  the  language  of  formulas  contains  equality  (t,\  =  tf)  and  ordering 
(t i  <  trfj  on  constraint  terms.  It  would  be  a  simple  matter  to  add  propositional  connectives  (A,  V,  D, 
_L)  to  the  constraint  logic;  however,  there  is  surprisingly  little  need  for  them  to  enforce  the  simple 
responsiveness  policy  of  TALT-R.  I  therefore  leave  them  out  of  this  presentation  for  simplicity. 

Judgment  Meaning  The  type  system  of  MiniTALT-R  de- 

A  b  y  prop  y  is  a  well-formed  constraint  formula.  fines  two  judSment  forms  not  Present 

Ahy  true  The  constraint  y  is  true.  in  MiniTALT'  Both  of  these  have  to  do 

with  constraint  formulas;  their  mean- 

Table  3.1:  New  Typing  Judgments  of  MiniTALT-R  'nSs  are  summarized  in  Table  3.1.  The 

judgment  Ahy  prop  means  that  in 
context  A,  the  formula  y  is  well-formed.  The  rules  for  this  judgment  (along  with  two  relevant 
kinding  rules)  are  given  in  Figure  3.2.  Note  that  a  formula  need  not  be  "true"  in  order  to  be 
well-formed. 

The  notion  of  "truth"  for  constraint  formulas  is  captured  by  the  other  new  judgment  form: 
the  judgment  Ahy  true  means  that  the  truth  of  the  formula  y  follows  from  the  assumptions 
in  A.  Note  that  according  to  Figure  3.1,  A  may  contain  both  kinding  assumptions  of  the  form 
oc.K  and  hypotheses  of  the  form  y  true.  The  inference  rules  defining  the  truth  judgment  are  given 
in  Chapter  4  along  with  extensive  discussion  of  the  rationale  for  their  design  and  of  their  proof- 

1In  my  thesis  proposal  [69],  the  constraint  formulas  were  themselves  static  terms  of  a  particular  kind.  Although 
aesthetically  tempting,  this  formulation  proved  not  to  scale  soundly  to  the  full  implemented  Talt. 
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(n  >  0)  A  b  tx  :  N  A  b  t2  :  N  A  b  tj  :  N  A  h  :  N  A  b  :  N  A  b  t2  :  N 
A  b  n  :  N  A  b  ii  +  £2  :  N  A  b  £i  <  £2  prop  A  b  £i  =  t2  ■  prop 


Figure  3.2:  Formation  Rules  for  Constraints 


theoretic  consequences.  They  capture  a  useful,  if  naive,  theory  of  addition  of  natural  numbers  that 
allows  all  of  the  idioms  discussed  in  the  remainder  of  this  thesis  to  be  certified.  (Impatient  readers 
can  find  the  rules  in  Figure  4.1.) 


3.3.2  The  Virtual  Clock 

Accounting  for  the  virtual  clock  in  the  type  system  is  a  fairly  straightforward  matter.  Talt-r's 
treatment  of  the  clock  is  more  or  less  analogous  to  that  of  TALres.  Register  file  types,  in  addition 
to  giving  types  for  the  machine's  general-purpose  registers  and  the  stack,  give  a  constraint  term 
that  conservatively  approximates  the  value  of  the  virtual  clock.  That  is  to  say,  if  AjTqr  b  I, 
then  the  instruction  sequence  I  may  safely  be  executed  if  the  value  of  the  virtual  clock  is  at  least 
(the  number  denoted  by)  T(ck).  Typing  rules  for  mundane  instructions  involving  no  control  flow 
reflect  this  with  some  simple  bookkeeping.  For  example,  the  typing  rule  for  the  add  instruction  is: 


AjTqr  b  Oi  :  B4  AjTqr  b  o2  :  B4 
A;T;r  b  d  :  B4  -*•  T'  A;  T;  r'{ck:£}  b  I 

A;f  T  b  add  d,  oi,  02;  / 


(F(ck) 


1  + 1) 


Note  the  two  differences  from  the  analogous  rule  in  Talt:  First,  the  side  condition  requires  that 
the  clock  term  F ( c k )  have  the  form  1  + 1  for  some  term  t,  since  it  is  a  type  error  to  perform  an  add 
when  the  clock  is  zero.  Second,  the  final  premise  requires  that  the  continuation  I  be  well-typed 
assuming  only  £  on  the  virtual  clock,  since  the  add  will  have  used  one  time  unit. 

Correspondingly,  a  code  pointer  of  type  T'  — »  0  is  safe  to  jump  to  only  if  the  virtual  clock  is  at 
least  r'(ck)  after  the  jump ;  that  is,  the  clock  must  read  at  least  one  more  than  F'(ck)  in  order  for 
the  jump  instruction  itself  to  be  safe: 


A;  T;  r  b  o  :  (r{ck:£})  0 

A;  T;  T  b  jmp  o;  I 


(F(ck) 


1  + £) 


The  yield  instruction  may  be  performed  at  any  time,  and  resets  the  virtual  clock  to  Y: 

AjTqTlckfF}  b  I 
A;f  ;T  b  yield; I 


The  three  rules  just  presented  preserve  or  improve  the  accuracy  of  the  constraint  term  T(ck) 
with  respect  to  the  actual  value  of  ck.  In  general,  though,  T(ck)  is  an  inexact  approximation  of 
the  virtual  clock.  The  imprecision  is  due  to  TALT-R's  rule  for  register  file  subtyping,  which  allows 
the  constraint  term  assigned  to  ck  to  vary: 

A  b  t'  <  t  true  A  b  r  <  r'  A  b  r,  <  t-  for  1  <  i  <  N 
A  b  {rl:ri, . . . ,  rN:T7v,  sp:r,  ck:i}  <  {rl:Tj, . . . ,  ri T.t'n,  sp :r',  ck:t'} 
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According  to  this  rule,  a  register  file  type  where  the  virtual  clock  reads  t  can  be  a  subtype  of 
one  where  it  reads  t'  if  the  formula  t'  <  t  can  be  proved  in  the  constraint  logic.  Intuitively,  the 
register  file  type  on  the  left  specifies  that  the  value  of  the  virtual  clock  is  at  least  t;  if  t'  <  t,  then 
anything  that  is  at  least  t  will  also  be  at  least  t! .  The  register  file  type  specifying  ck:t  is  a  stronger 
requirement  on  the  state  of  the  machine,  consistent  with  the  usual  meaning  of  subtyping. 

Because  the  register  file  subtyping  rule  involves  reasoning  about  the  virtual  clock,  the  sub¬ 
typing  rule  for  arrow  types  and  the  subsumption  rule  for  instruction  sequences  take  on  additional 
meaning  in  Talt-R  as  well.  To  be  specific,  the  subsumption  rule  (inherited  unchanged  from  Talt): 

Aj^jT' hi  A  b  T  <  V 
A;T;T  h  I 

now  allows  an  instruction  sequence  to  "forget"  about  some  of  the  remaining  ticks  on  the  virtual 
clock.  The  subtyping  rule  for  code  pointer  types  T  — >  0  is  contravariant  in  T  as  always: 

A  h  T'  <  T 
AhT^0<r'^0 

Coupled  with  the  register  file  subtyping  rule,  this  means  that  a  pointer  to  an  instruction  sequence 
expecting  t  on  the  clock  may  be  used  in  place  of  one  expecting  tl  if  t  <  t' .  Intuitively  speaking,  this 
is  because  any  subsequent  jump  to  that  pointer  will  have  to  provide  a  clock  of  at  least  t' ,  which 
will  be  at  least  enough  since  the  instruction  sequence  requires  only  t. 

I  pause  here  to  note  that  if  the  premise  A  b  t'  <  t  true  in  the  register  file  subtyping  rule  were 
replaced  by  A  b  t  <  t'  true,  then  the  sense  of  the  approximation  of  R(ck)  by  T(ck)  would  be 
reversed.  That  is,  a  register  file  type  T  would  describe  machine  states  in  which  the  value  of  the 
virtual  clock  was  at  most  T(ck).  If  the  premise  were  replaced  by  A  b  t'  =  t  true,  then  the  static  term 
in  the  register  file  type  would  always  correspond  exactly  to  the  value  of  the  clock.  I  will  discuss 
some  applications  of  these  variants  of  the  system  in  Chapter  8. 

3.3.3  Guarded  and  Singleton  Types 

There  are  two  forms  of  type  in  Talt-r  that  need  to  be  discussed  here:  the  singleton  types  (5(f)), 
which  are  really  the  same  as  the  singletons  written  set=(f)  in  TALT  but  have  been  endowed  with 
some  new  capabilities,  and  the  guarded  types  (y  =>■  t),  which  are  new.  The  intuitive  meanings  of 
these  types  are  simple,  but  their  usefulness  may  not  be  obvious  until  I  discuss  yield-placement 
strategies  later  on  in  the  thesis.  Basically,  I  will  use  them  to  construct  more  precise  types  for 
functions  than  would  otherwise  be  possible,  so  that  the  constraint  reasoning  built  into  the  type 
system  can  recognize  more  efficient  code  as  safe.  They  are  not  strictly  necessary  in  the  sense 
that  it  is  possible  to  write  a  compiler  whose  output  is  well-typed  without  them,  but  they  deliver 
significant  performance  benefits  for  a  reasonably  small  metatheoretic  investment. 

A  guarded  type  y  =>-  t  describes  values  that  may  be  used  at  type  r  only  if  the  formula  y  is 
true.  This  is  captured  by  a  subtyping  rule: 

A  b  r  :  T  A  b  y  true 
Aby=^r<r 

Using  this  rule,  an  operand  o  of  type  y  r  may  be  promoted  to  type  r  if  y  is  provable  in  the 
constraint  logic.  If  the  truth  of  y  cannot  be  derived,  then  no  interesting  use  can  be  made  of  o. 


3.3.  STATIC  SEMANTICS 


33 


A  h  t  :  N  (0  <  n  <  28H/  -  1)  A  h  ti  =  t2  true  A  F  t  :  N 

AhS{t):TW  Ah  n:S(n)  A  h  S(t\)  <  <S(f2)  A  h  S(t)  <  BID 


Figure  3.3:  Elementary  Rules  for  Singletons 


The  introduction  mechanism  for  guarded  types  differs  slightly  between  Talt-R  and  MiniTALT- 
R.  In  both  systems,  there  is  a  guarded  type  introduction  rule  for  values: 

(A,  < p  true); 

A;  'T  h  v  :  p  =>  r 

According  to  this  rule,  to  conclude  that  v  has  type  ip  =>  r  it  suffices  to  show  that  v  has  type  r,  under 
the  assumption  that  ip  is  true.  Importantly,  the  derivation  of  v  :  r  may  depend  on  the  hypothesis 
ip  true;  v  need  not  be  well-typed  at  all  without  it.  It  is  worth  noticing  that  guarded  types  bear  a 
certain  similarity  to  V-types:  both  are  introduced  by  typing  a  value  under  some  new  assumption, 
and  both  are  eliminated  by  subtyping  rules  that  "validate"  the  assumption. 

It  is  very  important  that  one  be  able  to  give  guarded  types  to  code  pointers — more  important, 
in  fact,  than  for  any  other  kind  of  value.  In  Talt-R,  blocks  of  code  are  simply  values,  and  so 
the  above  rule  is  sufficient.  In  MiniTALT-R,  instruction  sequences  are  treated  specially,  so  an 
additional  guarded  type  introduction  rule  for  blocks  is  required: 

'T;  (A,  ip  true)  F  I  :  r  block 
$;Ah/  :  ip  =>  t  block 

This  rule  is  analogous  to  the  rule  for  values,  and  states  that  one  may  give  a  guarded  type  to  (the 
address  of)  a  block  of  instructions  that  is  well-typed  under  the  assumption  that  the  guard  is  true. 

Singleton  types  in  Talt  and  Talt-R  play  a  role  similar  to  that  of  singletons  in  DTAL  [72]  and 
LTT  [16].  In  DTAL  one  writes  a  singleton  type  as  int(.x),  where  x  is  an  "index  expression";  in  LTT 
one  writes  Sjnt(M),  where  M  is  the  proof-language  representation  of  an  integer.  The  Talt-R  type 
S(t )  is  well-formed  when  t  is  a  well-formed  constraint  term  (i.e.,  it  has  kind  N),  and  contains  at 
most  one  value:  the  word-sized  unsigned  binary  representation  of  the  natural  number  denoted  by 
t.  (If  the  meaning  of  t  is  outside  the  representable  range,  then  S(t)  is  an  empty  type.)  The  most 
elementary  rules  for  singleton  types  are  shown  in  Figure  3.3. 

In  DTAL  and  LTT,  programs  may  perform  arithmetic  on  values  of  singleton  type,  and  the  type 
system  tracks  this  manipulation  symbolically  by  giving  an  appropriate  singleton  type  to  the  result. 
As  it  happens,  the  particular  use  I  have  in  mind  for  singleton  types  is  to  describe  a  counter  which 
is  repeatedly  decremented  until  it  reaches  zero.  Consequently,  the  only  form  of  arithmetic  I  will 
need  for  singletons  is  a  combined  subtract-and-conditional-jump  operation;  it  is  for  this  reason 
that  the  sub  jae  instruction  is  included  in  TALT-R.  As  I  have  already  mentioned,  the  instruction 
sequence  (sub  jae  r^, 01,02,03  I)  subtracts  the  value  of  02  from  01  and  stores  the  result  in  rj;  if 
this  result  is  greater  than  or  equal  to  zero,  control  jumps  to  the  address  in  03;  otherwise,  execution 
continues  with  I.  The  sub  jae  instruction  has  a  special  singleton-aware  typing  rule: 

A;^;T{rd:BFL,ck:t}  F  / 

A;  Ty  T  h  03  :  Va:N.(u  =  v  +  a)  =>  T{rrf:5(a),  ck :t}  — >  0 
A;^;Thoi  :  S(u)  A;f;rho2:5(r)  (T(ck)  =  2  +  t) 

A;$;TI-  subjae  rrf,oi,o2,o3  I 
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This  rule  shows  how  to  type  a  sub  jae  instruction  when  the  two  operands  to  be  subtracted  have 
singleton  types  S(u)  and  S(v)  respectively.  Notice  the  different  typing  conditions  associated  with 
the  two  possible  outcomes  of  the  conditional  jump.  If  the  branch  is  taken,  then  the  result  is  non¬ 
negative  and  hence  the  subtraction  falls  within  the  domain  of  natural  number  arithmetic;  the 
target  of  the  jump  is  therefore  allowed  to  assume  that  the  result  is  some  natural  number  a  such 
that  the  larger  operand  is  equal  to  the  sum  of  a  and  the  smaller  operand.  If  the  branch  is  not  taken, 
however,  the  result  of  the  subtraction  is  negative  and  cannot  be  reasoned  about  in  my  theory  of 
natural  numbers;  hence  the  instruction  sequence  /  must  be  well-formed  assuming  only  that  the 
destination  register  contains  an  integer.  Finally,  note  that  the  virtual  clock  is  decremented  by  two 
instead  of  by  one;  this  is  because  sub  jae  is  implemented  by  a  sequence  of  two  instructions  on  a 
concrete  IA-32  machine. 

3.3.4  Expanding  Singleton  Reasoning 

Although  sub  jae  is  the  only  singleton  instruction  required  for  the  compilation  strategies  I  de¬ 
scribe  in  this  thesis,  and  the  only  one  supported  by  my  implementation  of  Talt-R,  there  are  a  few 
other  singleton-related  instructions  and  typing  rules  that  are  sound  in  principle  and  for  which 
support  could  easily  be  added. 

Checked  Addition  It  may  be  desirable  to  include  a  singleton-aware  add  instruction.  The  main 
difficulty  here  is  that  the  Talt-R  constraint  logic  is  concerned  with  (arbitrary)  natural  numbers 
whereas  arithmetic  in  assembly  language  is  performed  modulo  2*"  .  Expressing  the  results  of 
modular  arithmetic  in  the  constraint  logic  presents  two  difficulties:  first,  it  requires  adding  multi¬ 
plication  to  the  logic;  second,  it  does  not  allow  one  to  reason  about  inequalities  as  easily.  A  more 
attractive  solution  is  for  the  singleton  addition  operation  to  be  a  "double"  instruction  like  sub  j  ae, 
so  that  it  automatically  detects  when  its  result  is  inconsistent  with  natural  number  arithmetic.  Just 
as  a  subtraction  can  be  reflected  in  the  logic  as  long  as  the  result  is  not  negative,  an  addition  can 
be  accounted  for  as  long  as  it  does  not  overflow.  The  appropriate  compound  instruction  for  sin¬ 
gleton  addition  is  therefore  add  jnc,  or  "add  and  jump  if  no  carry."  The  syntax  and  typing  rule 
are  analogous  to  sub  jae  (but  the  typing  premise  for  the  jump  target  is  simpler): 

A;  \F;  T{r<pBFF,  ckd}  h  I 
A;  TqT  h  o3  :  T{rd:S(u  +  v ),  ck:i}  — >  0 
A;  T;  T  h  oi  :  S(u)  A;  'F;  T  h  o2  :  S(v)  (T(ck)=2  +  t) 

A;  >F;  T  h  add  jnc  rd,  01,02,03  I 


Inverted  Checked  Arithmetic  In  the  sub  j  ae  and  add  j  nc  instructions,  the  conditional  branch  is 
taken  if  the  result  of  the  arithmetic  operation  can  be  given  a  useful  singleton  type  and  falls  through 
if  it  cannot.  This  is  convenient  for  the  particular  idiom  I  have  in  mind  for  sub  j  ae  (to  be  covered  in 
Chapter  6),  but  the  system  would  be  more  symmetrical  if  there  were  alternative  checked  singleton 
arithmetic  instructions  with  the  branch  conditions  reversed.  There  is  no  difficulty  in  principle  with 
adding  sub  jb  ("subtract  and  jump  if  below")  and  addjc  ("add  and  jump  if  carry")  to  Talt-R. 
The  following  typing  rules  soundly  describe  their  semantics: 

A;  T;  T  h  03  :  rjr^BlF,  ck :t}  — >  0 
(A,  a:N,  (u  =  v  +  a)  true);  \F;  T{rd:<S(a),  ck:t}  h  I 
A;^;T  h  01  :  S(u)  A;  'F;  T  h  o2  :  S(v)  (T(ck)  =  2  +  t) 

A;  T;  T  h  subjb  rrf,oi,o2,o3  I 
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A;  T;  r{r^:5(u  +  v),  ck:t}  b  I 
A;  'I';  T  b  03  :  r{r<pBlU ,  ck :t}  — >  0 
A;^;T  b  01  :  S(u)  A;*;T  b  o2  :  S(v)  (T(ck)=2  +  f) 

A;  T;  T  b  add  j  c  rd,oi,o2,o3  J 

In  fact,  it  is  important  for  performance  reasons  to  allow  the  programmer  or  compiler  to  choose 
the  sense  of  conditional  jumps.  Most  IA-32  processors  use  a  static  branch  prediction  heuristic 
which  assumes  (until  more  information  is  available)  that  backward  conditional  jumps  are  taken 
and  forward  conditional  jumps  are  not  taken.  In  order  to  produce  fast  code  for  superscalar  ar¬ 
chitectures  with  deep  pipelines,  programmers  are  encouraged  to  arrange  their  code  so  that  these 
assumptions  are  likely  to  be  accurate  [39]. 

Unchecked  Arithmetic  When  I  discuss  the  application  of  Talt-R  to  different  safety  policies  in 
Chapter  6, 1  will  encounter  situations  where  a  subtraction  of  two  singleton  values  is  required  and 
it  is  statically  known  which  of  the  two  quantities  is  larger.  In  this  case  the  j  ae  part  of  the  sub  j  ae 
instruction  is  unnecessary  (and  its  presence  obnoxious)  because  that  branch  will  never  be  taken. 
The  following  rule  for  unchecked  singleton  subtraction  allows  the  operation  to  proceed  under 
those  conditions: 

A;  T;  T  b  o\  :  S(u  +  v)  A;  VH;  T  b  o2  :  S(u ) 

A;  Tqr'jckh}  b  I  A;  T;  T  b  d  :  S(v)  — >  T'  (T(ck)  =  1  +  t) 

A;  T;  T  b  sub  d,  01,02  I 

A  rule  for  unchecked  singleton  addition,  analogous  to  this  one,  is  presumably  sound  but  seems 
less  likely  to  be  helpful  in  practice. 

Ordering  and  Addition  Another  reasonable  typing  rule  that  could  be  added  to  Talt-R  to  enrich 
the  capabilities  of  its  singleton  types,  but  for  which  I  have  not  found  an  immediate  need,  is  the 
following  subtyping  rule: 

A  b  t  <  u  true 
A  b  S(u)  <  3a:N.<S(i  +  a) 

It  states  that  if  t  <  u,  then  the  number  u  can  be  thought  of  as  the  sum  of  t  and  an  unknown  natural 
number. 

3.4  Certification  and  Verification 

The  process  of  producing  and  verifying  certified  binaries  using  Talt-R  is  analogous  to  the  process 
for  Talt  described  in  Chapter  2.  Specifically,  a  compiler  wishing  to  target  Talt-R  outputs  pro¬ 
grams  in  EXTALT-R,  a  user-friendly  explicitly-typed  assembly  language.  The  Talt-R  assembler 
transforms  an  EXTALT-R  program  into  a  TBF  file  (see  Section  2.1.3,  Figure  2.2)  by  translating  the 
assembly  instructions  into  binary  machine  code  and  generating  a  certificate.  The  certificate  is  (the 
LF  representation  of)  an  XTALT-R  program.  The  consumer-side  certificate  verifier  is  analogous  to 
the  one  described  for  Talt. 

Because  of  the  strength  of  the  analogies  between  the  Talt  and  Talt-R  families  of  languages,  I 
will  sometimes  refer  to  XTALT  and  XTALT-R  as  "the  'X'  languages"  and  to  EXTALT  and  EXTALT-R 
as  "the  'Ex'  languages." 
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3.4.1  XTALT-R 


The  Xtalt-r  language,  like  Xtalt,  is  designed  to  be  as  easy  as  possible  to  type-check.  This  means 
removing  all  the  ambiguity  and  implicitness  that  make  type-checking  for  Talt-R  itself  (presum¬ 
ably)  impossible.  Like  the  "Mini"  languages  detailed  in  this  thesis,  XTALT  and  XTALT-R  divide 
the  contents  of  a  "program",  which  in  Talt  and  Talt-R  is  just  a  single  large  value,  into  labeled 
blocks;  unlike  the  "Mini"  languages,  the  "X"  languages  require  the  programmer  to  specify  a  type 
for  each  label.  This  essentially  identifies  the  memory  typing  (T)  under  which  the  whole  program 
is  supposed  to  be  well-typed,  eliminating  an  ambiguity  that  would  be  difficult  to  resolve  by  infer¬ 
ence. 

An  even  greater  source  of  difficulty  in  type-checking  is  the  richness  of  the  theory  of  subtyping 
in  Talt  and  Talt-r.  (Most  of  the  difficulties  here  are  reflected  in  the  "Mini"  languages.)  To  avoid 
the  need  to  decide  the  (presumably)  undecidable  subtyping  relation,  the  "X"  languages  replace 
the  subtyping  judgment  A  b  n  <  72  by  a  calculus  of  coercions.  Essentially,  a  coercion  is  a  reified 
subtyping  derivation:  the  coercion  typing  assertion  A  b  q  :  t\  <  t-2  means  that  the  coercion  q 
represents  a  proof  that  n  is  a  subtype  of  T2-  The  subsumption  rule  for  operands  in  TALT  or  TALT- 
R, 

A;f;Tho:r'  A  h  r'  <  r 
A;  T;  T  L  o  :  r 


is  replaced  in  the  respective  "X"  languages  by  coercion  application,  written  @qo  and  having  the 
typing  rule: 

A;  \k;  T  L  o  :  r'  A  \~  q  :  t'  <  t 
A;f  T  h  @q  o  :  r 


The  coercions  themselves,  in  turn,  correspond  almost  exactly  to  subtyping  derivations.  There  is  a 
form  of  coercion  for  each  subtyping  rule  in  the  underlying  theory,  so  that  A  b  r  <  t1  is  derivable 
if  and  only  if  there  is  a  coercion  q  such  that  Ah  q  :  t  <  t'.  For  instance,  the  rule  stating  that  any 
type  is  a  subtype  of  nonsense  (of  the  appropriate  size)  corresponds  to  a  coercion  called  forget. 
That  is: 


A  h  r  :  Ti  _ A  h  r  :  Ti _ 

A  h  r  <  nsi  corresponds  to  Ah  forget  :  r  <  nsi 


XTALT-R  must  extend  Xtalt  with  coercion  forms  for  all  of  the  new  subtyping  rules  TALT-R 
adds  to  Talt.  The  ones  that  require  the  most  novelty  are  the  rules  with  constraint-truth  premises, 
like  the  guard  satisfaction  rule: 

A  h  r  :  T  A  h  <p  true 

A  h  (<p  r)  <  r 

In  order  for  a  typechecker  to  accept  a  coercion  witnessing  this  relation  as  well-formed,  it  must  be 
evident  that  p  is  true.  The  easiest  way  to  achieve  this  is  to  require  the  coercion  itself  to  provide  the 
evidence;  in  other  words,  it  must  contain  a  proof  of  ip.  Thus,  the  coercion  has  the  form  satisfy  n, 
where  ir  is  a  proof  term: 

A  h  7 x  :  p 

A  h  sat±sfy7r  :  (p  =>  r)  <  r 

Proof  terms  reify  truth  derivations  in  exactly  the  same  way  that  coercions  reify  subtyping  deriva¬ 
tions.  (The  truth  derivations  themselves  are  discussed  in  Chapter  4.)  They  appear  in  the  coercion 
forms  associated  with  all  TALT-R  subtyping  rules  that  have  constraint-truth  premises. 
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3.4.2  EXTALT-R 

The  "Ex"  languages,  the  explicitly-typed  languages  generated  by  compilers  and  processed  by  the 
certifying  assemblers,  are  the  most  concrete,  public  and  visible  incarnations  of  Talt  and  Talt-R. 
They  are  therefore  the  variants  for  which  human-readability  and  -writability  are  the  most  impor¬ 
tant.  This  creates  tension  between  the  desire  for  ease  of  use  on  the  one  hand,  and  the  fact  that 
removing  almost  any  of  the  annotations  in  the  "X"  languages  leads  to  undecidability  on  the  other. 

EXTALT-R  follows  EXTALT  in  retaining  the  calculus  of  coercions  used  in  the  "X"  languages 
to  circumvent  the  presumed  undecidability  of  subtyping.  The  one  source  of  nonuniformity  is  the 
need  for  proof  terms  in  XTALT-R  to  reify  constraint  truth  derivations.  EXTALT-R  does  not  use  proof 
terms,  for  two  reasons:  First,  since  proof  terms  are  a  new  syntactic  class,  distinct  from  type  con¬ 
structors  and  coercions,  adding  them  to  EXTALT-R  would  represent  a  significant  increase  in  syn¬ 
tactic  complexity  compared  to  EXTALT.  Second,  the  very  concept  of  proof  terms  being  unfamiliar 
to  most  programmers  other  than  the  few  who  are  type  theory  experts,  requiring  them  to  appear  in 
programs  would  greatly  increase  the  steepness  of  the  learning  curve  for  EXTALT-R  programming 
or,  more  importantly,  certifying  compiler  development.2  Third,  truth  deriviations  are  ubiquitous 
in  the  typings  of  even  the  most  elementary  Talt-R  programs,  because  the  ck  terms  in  register  file 
types  must  very  often  be  rewritten  (using  an  equality  formula  and  the  register  file  subsumption 
rule)  in  order  to  match  the  form  required  by  the  omnipresent  side  conditions  in  instruction  typing 
rules. 

The  purpose  of  these  side  conditions  is  to  capture  the  idea  that  the  virtual  clock  is  decremented 
for  every  instruction.  The  EXTALT-R  assembler  performs  this  symbolic  decrementation  automati¬ 
cally,  using  a  very  simple  heuristic.  When  the  type-checker  encounters  an  instruction  requiring  k 
"ticks"  of  the  virtual  clock  and  the  current  register  file  type  is  T,  it  updates  T(ck)  to  dec(T(ck),  k) 
before  moving  to  the  next  instruction  as  long  as  the  latter  is  well-defined  according  these  rules: 


dec(n,  k)  =  n  —  k  if  n  >  k 
dec(t  +  t',  k )  =  dec(t,  k)  +  t' 


If  application  of  these  rules  fails,  the  assembler  gives  up  and  reports  a  type  error. 

The  other  constructs  that  demand  proof  terms  are  subtyping  judgments,  most  commonly  the 
register-file  subtyping  that  must  be  checked  for  jump  operands  and  the  GUARD-ELIM  rule  reified 
as  the  satisfy  coercion,  discussed  earlier.  In  each  of  these  cases,  the  assembler  does  not  require 
the  program  text  to  contain  a  proof  term,  but  instead  tries  to  construct  one  using  a  semi-decision 
procedure  for  the  Talt-r  constraint  logic  called  depth-limited  semantic  proof  search  (DLP),  discussed 
in  the  next  chapter.  In  fact,  the  logic  is  decidable;  however,  DLP  is  sufficient  for  the  purposes  of 
my  responsiveness-certifying  compiler,  and  since  I  have  not  found  a  decision  procedure  as  simple 
and  efficient  I  have  not  attempted  to  implement  anything  more  advanced.  Future  work  on  TALT-R 
may  require  replacing  the  existing  constraint  logic  with  something  more  heavyweight,  in  which 
case  the  certification  process  would  presumably  need  to  incorporate  a  serious  theorem  proven 


2Coercions,  of  the  kind  found  in  the  "X"  and  "Ex"  languages,  are  also  unfamiliar.  Their  presence  in  the  language 
can  be  rationalized  to  the  extent  that  they  act  more  or  less  like  functions  that  change  the  type  of  a  value,  like  C-style 
type  casts  which  everyone  understands.  On  the  other  hand,  the  verbosity  of  the  coercions  that  appear  in  most  EXTALT 
programs  indicates  that  the  concise  and  user-friendly  representation  of  the  information  they  carry  is  a  language  design 
issue  worthy  of  attention. 
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3.5  Chapter  Summary 

In  this  chapter,  I  stated  a  specific  timing  policy  that  applies  to  many  real-world  situations.  Mo¬ 
tivated  by  this  policy,  I  presented  most  of  the  design  of  a  type  system  to  certify  compliance  with 
this  policy  (the  remainder  of  the  static  semantics  will  be  revealed  in  the  next  chapter). 

The  type  system,  TALT-R,  is  an  extension  of  Talt  with  clock  reasoning  in  the  style  of  TALres 
and  dependent  types  in  the  style  of  DTAL.  I  have  described  the  salient  features  of  its  static  seman¬ 
tics,  including  potential  variations  which,  as  I  will  show  later  on,  open  the  door  to  certification  of 
other  interesting  timing  and  resource  control  policies. 


Chapter  4 

The  TALT-R  Constraint  Logic 


One  of  the  most  important  features  of  Talt-R  that  makes  it  possible  to  generate  certifiable  pro¬ 
grams  without  inserting  excessively  many  yields  is  its  constraint  logic.  Through  the  mechanism  of 
guarded  types,  discussed  in  Chapter  3,  the  typing  of  a  program  can  depend  on  the  "truth"  of  some 
constraint  formulas',  this  feature  of  the  type  system  allows  portions  of  programs  to  be  given  types 
that  describe  their  clock  behavior  more  precisely  than  would  otherwise  be  possible,  which  in  turn 
amounts  to  the  ability  to  type  programs  whose  clock  behavior  requires  more  subtle  justification 
than  would  otherwise  be  allowed. 

In  order  to  define  a  type  system  that  has  this  built-in  constraint  logic,  and  to  prove  theorems 
about  that  system  formally,  it  is  necessary  to  give  a  formal  definition  for  the  logic  itself.  This  in¬ 
volves  not  only  specifying  the  "language"  of  formulas  upon  which  typing  can  depend,  but  also 
defining  precisely  what  it  means  for  a  formula  to  be  true.  In  the  design  of  Talt-R  it  was  critical 
to  find  a  balance  between  simplicity  and  power:  the  simpler  the  logic,  the  less  effort  a  formal 
safety  proof  would  require,  but  a  certain  amount  of  proving  power  was  necessary  in  order  to  type 
interesting  programs.1  My  goal,  therefore,  was  to  find  the  simplest  possible  logic  that  could  ac¬ 
comodate  my  compilation  strategy.  The  results  of  that  exercise  make  up  this  chapter.  First  I  set 
down  the  definition  of  the  Talt-R  constraint  logic,  after  first  discussing  some  general  considera¬ 
tions  that  influenced  its  design.  The  remainder  of  the  chapter  investigates  the  metatheory  of  the 
logic  more  deeply,  to  better  understand  its  algorithmic  properties  and  characterize  its  power. 


4.1  The  Logic 

4.1.1  Terms  and  Formulas 

The  Talt-R  constraint  logic  can  be  seen  as  a  language  of  first-order  predicates  that  lacks  any 
logical  connectives  or  quantifiers.  The  propositional  connectives  pose  no  theoretical  difficulty, 
and  I  conjecture  that  some  limited  use  of  quantifiers  would  not  either. 

Formally,  the  terms  of  the  logic  are  the  static  terms  of  kind  N  and  the  formulas  are  the  constraint 
formulas  as  defined  in  Chapter  3.  In  this,  chapter,  though,  I  want  to  consider  the  logic  in  isolation 
from  the  rest  of  the  type  system.  I  therefore  restrict  my  attention  to  "simple"  terms  and  formulas, 
which  are  characterized  by  a  very  simple  structure. 

1  As  mentioned  at  the  beginning  of  Chapter  3, 1  did  not  actually  do  a  formal  safety  proof  for  Talt-R;  however,  it  was 
still  an  important  design  criterion  to  make  the  execution  of  such  a  proof  as  straightforward  as  possible. 
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Definition  4.1  The  simple  terms  are  the  static  terms  generated  by  the  grammar: 

t  ::=  a  \n  \  t\  + 12- 

A  simple  formula  is  a  constraint  formula  containing  only  simple  terms. 

A  pre-simple  context  is  one  that  contains  no  kinding  assumptions  a:K  with  K  f  N. 

A  simple  context  is  a  pre-simple  context  in  which  all  the  constraint  terms  that  appear  are  simple. 

(Alternatively,  the  simple  terms  are  the  /5-normal  terms  t  for  which  there  exists  a  pre-simple 
context  A  such  that  A  b  t  :  N.)  Beginning  with  Section  4.1.2,  I  shall  tacitly  assume  that  all  con¬ 
texts,  terms  and  formulas  encountered  in  the  remainder  of  this  chapter  are  simple.  Fortunately, 
the  "simple"  system  and  the  unrestricted  system  are  of  comparable  power:  in  particular,  it  can 
be  shown  that  for  any  well-formed  context  A  and  well-formed  formula  y  there  exist  a  simple 
context  A'  and  simple  formula  y'  such  that  A  b  y  true  if  and  only  if  A'  b  y'  true.  Moreover,  the 
simple  versions  can  be  computed  from  the  originals,  so  decidability  of  the  simple  system  implies 
decidability  for  well-formed  judgments  in  the  unrestricted  system.2 

4.1.2  Defining  Truth 

While  discussing  of  Talt-r's  static  semantics  in  Chapter  3  I  referred  to,  but  did  not  define,  the 
constraint  truth  judgment  form  A  b  y  true,  which  appears  as  a  premise  in  a  few  key  rules  of 
the  static  semantics.  It  is  the  business  of  this  section  to  give  a  definition  for  this  judgment;  first, 
though,  I  will  review  some  of  the  requirements  for  this  definition. 

That  the  truth  judgment  should  be  sound  seems  too  obvious  a  necessity  to  need  any  discussion. 
After  all,  how  can  the  type  system  be  sound  for  programs  as  a  whole  if  one  of  its  judgments  does 
not  have  the  intended  meaning?  What  makes  this  requirement  interesting  for  Talt-R  is  that  it  is 
not  enough  for  the  truth  judgment  merely  to  be  sound;  I  need  to  be  able  to  prove  its  soundness  in 
Twelf.  As  I  shall  argue,  this  rules  out  simple  "denotational"  definitions  of  truth  such  as  the  one 
proposed  for  integer  constraints  in  DTAL  [72];  instead,  I  will  define  truth  "syntactically",  using  a 
carefully  chosen  set  of  axioms  and  inference  rules.3 

So,  what  does  soundness  of  the  constraint  logic  mean,  and  where  does  it  come  up  in  the  Twelf 
proof  of  type  safety?  To  answer  these  questions,  consider  the  rule  for  register  file  subtyping,  which 
has  a  truth  premise  (the  one  in  the  box): 

A  h  t  <t'  Ah  Ti  <r[  for  1  <  i  <  N 
A  b  {eax:rax, . . . ,  ebp:rbp,  esp:rsp,  ck :t}  <  {eax:r'x, . . . ,  ebp:rbp,  esp:r'p,  ck :t'} 

Let  T  be  the  register  file  type  on  the  left  and  T'  be  the  one  on  the  right.  Then  this  rule  permits  T  to 
be  a  subtype  of  T'  only  if  the  judgment  A  b  T'(ck)  <  T(ck)  true  holds.  The  key  lemma  in  which 
the  effect  of  this  premise  is  felt  is  the  register  file  subsumption  lemma: 

Lemma  4.1  If  ■  b  T  <  T'  and  T  b  R  :  T,  then  $b  R  :T'. 

Here  R  is  a  register  file.  The  judgment  form  T  h  II  :  T  (which  we  have  not  encountered  before 
because  register  files  appear  only  in  the  dynamic  semantics)  means  that  the  register  file  R  has  type 

2This  is  perhaps  an  unfair  simplification  of  the  Talt  type  theory  as  implemented;  in  particular,  it  depends  on  the 
fact  that  the  static  term  language  of  MiniTALT-R  is  strongly  /3-normalizing.  In  fact,  the  static  term  language  of  Talt  as 
formalized  in  LF  is  not  normalizing,  and  the  truth  judgment  in  that  system  is  undecidable. 

3Readers  familiar  enough  with  Twelf  to  consider  this  decision  a  no-brainer  may  skip  the  next  few  paragraphs. 
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T  (under  heap  assumptions  1');  it  requires,  among  other  things,  that  the  clock  value  in  R  is  greater 
than  or  equal  to  the  number  denoted  by  T(ck).  Thus  in  order  to  prove  this  lemma  we  need  to 
know,  among  other  things,  that  if  the  number  denoted  by  T(ck)  is  less  than  or  equal  to  in,  and  the 
truth  judgment  •  b  T'(ck)  <  T(ck)  true  holds,  then  the  number  denoted  by  T'(ck)  is  also  less  than 
or  equal  to  m.  We  get  this  from  the  soundness  lemma: 

Lemma  4.2  (Soundness  of  Truth)  If  ■  F  n!  <n  true,  then  n'  <  n. 

This  is  the  most  important  soundness  result  for  the  Talt-R  constraint  logic. 

Observe  that  the  soundness  lemma,  which  we  wish  to  be  able  to  prove  as  a  Twelf  metatheorem, 
has  an  instance  of  the  truth  judgment  on  the  left  side  of  an  implication.  This  means  that  if  the 
definition  of  truth  involves  any  universal  quantification,  the  soundness  lemma  will  not  be  a  II2 
sentence  and  hence  will  not  be  provable  with  Twelf.  As  a  result  we  can  forget  about  "semantic" 
definitions  like  the  following: 

Non-Definition.  A  b  <p  true  iff  the  entailment  it  denotes  holds  over  the  natural  num¬ 
bers:  that  is,  iff  for  any  substitution  of  natural  numbers  for  the  constraint  term  variables 
declared  in  A  such  that  the  constraint  hypotheses  in  A  hold,  <p  holds. 

The  fact  that  this  definition  presupposes  knowledge  of  the  natural  numbers,  which  are  not  built 
into  Twelf,  is  annoying  but  it  is  not  the  issue.  The  real  problem  is  the  quantification  "for  any 
substitution. . . ,"  which  cannot  be  encoded  with  an  LF  type. 

The  answer  to  this,  of  course,  is  to  define  the  truth  judgment  the  way  anything  else  is  defined  in 
Twelf:  inductively,  as  the  least  set  of  judgments  closed  under  certain  inference  rules.  This  means 
that  Talt-r's  notion  of  "truth"  is  really  more  like  "provability",  with  the  rules  for  constructing 
proofs  fixed  in  advance  as  part  of  the  type  system. 

Formal  theories  of  the  natural  numbers  seem  to  come  in  two  main  varieties,  neither  of  which  is 
appropriate  for  Talt-R.  The  first  variety  comprises  theories  like  LXres  [18],  whose  power  comes 
from  a  rich  term  language  (featuring  addition,  multiplication  and  primitive  recursion  over  natural 
numbers,  as  well  introduction  and  elimination  forms  for  some  other  types)  and  the  associated 
theory  of  equality  (which  understands  basic  properties  of  addition  and  multiplication  as  well  as 
/3rj- conversion).  Such  a  theory  is  simple  to  describe,  but  does  not  support  hypothetical  reasoning, 
which  Talt-R  must  if  guarded  types  are  to  make  sense. 

The  second  variety  takes  the  form  of  a  set  of  axioms  expressed  in  a  logic ;  the  logic  is  generally 
first-,  second-  or  higher-order  classical  or  intuitionistic  predicate  logic,  and  the  axioms  usually  re¬ 
semble  those  of  Peano  or  Presburger  arithmetic.  These  theories  generally  do  support  hypothetical 
reasoning,  and  as  a  group  they  are  very  powerful:  in  principle,  one  could  use  the  Zermelo-Frankel 
axioms  for  sets  as  the  basis  for  such  a  theory  and  formalize  all  the  mathematics  one  needed.  Unfor¬ 
tunately,  this  class  of  theories  is  also  unacceptable  for  Talt-R.  That  most  of  them  are  undecidable 
(Presburger  arithmetic  being  the  notable  exception)  is  the  least  of  their  drawbacks:  after  all,  typ¬ 
ing  in  Talt  is  already  presumed  undecidable.  Of  much  greater  concern  is  the  fact  that  proofs  of 
even  the  most  routine  facts  in  these  theories  are  large,  difficult  to  construct,  and  even  harder  (for 
humans)  to  read.  This  would  be  crippling  for  Talt-R,  since  the  certificate  for  a  program  must 
include  proofs  of  all  the  constraint  judgments  on  which  its  typing  depends.  These  proofs  would 
have  to  be  generated  during  certification  and  transmitted  over  a  network,  and  a  large  investment 
of  time  would  be  required  to  produce  formal  proofs  of  all  the  "lemmas"  required.  The  complexity 
of  the  proofs  that  would  have  to  be  provided  to  the  certificate  generation  algorithm  would  re¬ 
duce  the  human-readability  and  -writability  of  EXTALT-R,  the  explicitly  typed  input  language  of 
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the  assembler.  Finally,  and  most  damningly,  the  consistency  of  the  axiomatization  would  have  to 
be  proven  in  Twelf;  perhaps  this  could  be  managed,  but  to  the  best  of  my  knowledge  there  is  no 
published  work  on  applying  Twelf  to  consistency  proofs  in  theories  this  complex. 

The  goal  for  Talt-R,  then,  was  to  devise  a  simple,  though  necessarily  incomplete,  axiomatiza¬ 
tion  of  natural  number  arithmetic  subject  to  the  following  three  considerations.  First,  the  sound¬ 
ness  of  the  theory  must  be  provable  as  a  Twelf  metatheorem.  Second,  although  the  theory  need  not 
be  complete  in  any  formal  sense,  it  must  be  "complete  enough"  to  derive  all  of  the  judgments  nec¬ 
essary  to  type  the  output  of  my  compiler.  Finally,  the  theory  must  be  decidable,  and  furthermore 
it  must  be  possible  to  produce  proofs  for  derivable  judgments  automatically. 

Note  on  Decidability 

Typing  in  in  the  implicitly-typed  Talt  language  is  undecidable.  A  certificate  for  a  Talt  program, 
therefore,  must  contain  at  least  enough  information  to  convince  the  verifier  that  a  typing  deriva¬ 
tion  exists.  So  that  the  Talt  assembler  can  produce  such  a  certificate,  the  EXTALT  input  to  the 
assembler  must  be  heavily  annotated.  In  particular,  wherever  the  typing  of  the  program  depends 
on  subsumption,  the  EXTALT  program  must  contain  a  coercion,  which  is  really  a  representation  of  a 
derivation  of  the  necessary  subtyping  relationship.  It  might  seem  reasonable,  therefore,  to  require 
an  EXTALT-R  program  to  include  proofs  of  constraint  formulas  on  which  its  typing  depends. 

Forcing  the  arithmetic  proofs  to  be  present  in  the  EXTALT-R  representation  of  a  program  has 
a  serious  drawback:  it  requires  the  person  or  program  that  generates  the  EXTALT-R  program  to 
produce  the  proofs.  This  seems  like  an  excessive  burden.  Proofs  in  any  theory  of  arithmetic  are 
likely  to  be  dense  and  hard  for  humans  to  read,  which  means  that  EXTALT-R  programs  whose 
typing  depends  on  them  will  be  very  difficult  to  write  or  debug  by  hand.  As  a  result  of  these 
considerations,  I  adopted  the  view  that  while  the  Talt-R  type  theory  itself  is  defined  in  terms 
of  a  particular  axiomatization  of  the  constraint  logic,  from  the  point  of  view  of  a  programmer  or 
compiler  writer  generating  EXTALT-R  code,  the  structure  of  proofs  is  an  implementation  detail 
that  does  not  need  to  be  understood;  furthermore,  theorem  proving  in  the  Talt-R  constraint  logic 
is  a  task  common  to  all  producers  of  Talt-R  programs,  so  it  is  the  responsibility  of  the  Talt-R 
implementor  (that  is,  me)  to  provide  a  tool  that  does  it.  The  theorem  prover  is  integrated  into  the 
assembler,  so  EXTALT-R  programs  never  need  to  contain  proofs. 

In  principle,  the  assembler's  theorem  prover  does  not  need  to  be  complete,  even  with  respect 
to  the  Talt-R  constraint  theory  (which  itself  need  not  be  complete  with  respect  actual  natural 
number  arithmetic).  Flowever,  it  is  highly  desirable  that  the  input  language  of  the  assembler  have 
a  concise  and  accessible  definition  so  that  programmers  and  compiler  writers  have  some  basis  on 
which  to  predict  whether  their  Extalt-r  code  will  be  accepted  or  not.  Since  the  assembler  now 
includes  the  constraint  prover,  the  definition  of  EXTALT-R  must  include  a  description  of  the  set  of 
constraint  judgments  it  will  be  able  to  derive.  In  other  words,  if  the  theorem  prover  I  build  into 
the  assembler  decides  a  proper  subset  of  Talt-r's  theory  of  arithmetic,  I  must  be  able  to  give  a 
concise  definition  of  that  decidable  subset. 

In  fact,  I  have  done  a  combination  of  these  things:  the  theory  presented  in  this  chapter  is  de¬ 
cidable  as-is,  as  I  prove  in  Section  4.2,  but  I  have  not  implemented  a  complete  decision  procedure. 
Instead,  I  describe  a  convenient  metric  of  proof  complexity  that  can  be  used  to  turn  a  sound  and 
complete  but  unbounded  proof  search  into  an  incomplete  but  terminating  procedure,  which  I  have 
implemented  in  the  Talt-R  assembler.  I  claim,  and  will  demonstrate  in  Chapter  7,  that  a  proof 
search  bounded  in  this  way  is  "complete  enough"  for  the  compilation  strategy  described  there. 
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{(p  true)  €  A)  A  h  t  :  N  A  h  t2  =  h  true  A  h  ti  =  £3  true  A  h  £3  =  £2  true 

A  h  <p  true  A  h  £  =  t  true  A  h  £1  =  £2  true  A  h  t\  =  t2  true 

A  h  t\  =  £',  true  A  h  t2  =  t'2  true  A  h  t  :  N 

A  h  t\  +  £2  =  t[  4- 1'2  true  A  h  m  +  n  =  m  +  n  true  A  h  0  +  £  =  £  true 

A  h  ti  :  N  A  I-  t2  :  N  _ A  h  tj  :  N  (for  i  =  1,  2,  3) _  (rn  <  n) 

A  h  ti  +  t2  =  t2  +  h  true  A  I-  (ti  +  t2)  +  ^3  =  t\  +  (^2  +  ^3)  true  Ah  rn  <n  true 

A  h  t\  =  t,2  true  A  h  ti  <  ^3  true  A  h  <  t2  true  A  h  t\  <  t2  true  A  h  t2  <  t\  true 

A  h  t\  <  t2  true  A  h  ti  <  t2  true  A  h  t\  =  t2  true 

A  h  ti  <  t\  true  Ahf2<  t'2  true  Aht  +  ti  <t,  +  t,2  true  A  h  t  :  N 

A  h  t\  + 12  <  t[  + 1'2  true  A  h  ti  <  t2  true  A  h  0  <  t  true 


Figure  4.1:  Truth  of  Formulas 


The  Truth  Judgment 

The  rules  defining  the  truth  judgment  are  given  in  Figure  4.1.  As  mentioned  earlier,  we  will 
assume  for  the  duration  of  this  chapter  that  contexts  A  contain  only  constraint  term  kinding  as¬ 
sumptions  (a:N)  and  constraint  hypotheses  of  the  form  <p  true  where  <p  is  simple. 

The  following  "substitution"  or  "cut"  property  will  come  in  useful  later  on. 

Proposition  4.1  (Cut)  If  A,  <p  h  true  and  A  h  true,  then  Ah  ip'  true. 

Since  truth  is  a  hypothetical  judgment  rather  than  a  sequent-style  proof  system,  the  proof  of  this 
property  is  very  straightforward. 

4.2  Decidability 

The  main  result  of  this  section  is  the  decidability  of  the  Talt-R  constraint  logic: 

Given  A  and  p,  it  is  decidable  whether  or  not  Ah  p  true. 

The  fact  that  this  logic  is  "complete  enough"  is  part  of  the  type  preservation  theorem  for  my 
compilation  strategy,  which  I  will  cover  later  on.  Moreover,  its  soundness  can  be  proven  in  Twelf.4 

4.2.1  Proof  Overview 

The  proof  of  decidability  is  based  on  two  main  insights,  which  taken  together  reveal  that  the 
existence  of  a  proof  for  a  given  formula  (in  a  given  context)  is  equivalent  to  the  existence  of  a 

4Essentially  the  same  constraint  logic  was  added  to  the  Talt  implementation  by  Karl  Crary  shortly  after  I  proposed 
its  inclusion  in  Talt-R;  the  Twelf  safety  proof  for  Talt  therefore  includes  a  soundness  proof  for  this  logic. 
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feasible  solution  to  a  certain  integer  linear  program  that  is  easy  to  extract  from  the  formula  and 
the  context.  (This  may  seem  anticlimactic:  after  all,  the  use  of  integer  programming  to  solve 
constraints  of  this  kind  is  quite  common.  However,  deciding  the  validity  of  a  constraint  over  the 
integers  is  not  the  same  thing  as  deciding  its  derivability  in  this  logic,  which  is  relatively  weak;  I 
therefore  had  no  a  priori  expectation  that  any  off-the-shelf  algorithm  would  work.) 

The  first  insight  is  that,  modulo  regrouping  and  reordering,  a  term  t  is  just  a  finite  sum  of 
"atomic"  terms,  each  of  which  is  either  a  variable  or  a  literal  number.  If  we  imagine  "combining 
like  terms"  as  in  high-school  algebra  (which  is  really  just  counting  the  number  of  occurrences  of 
each  variable  and  adding  together  all  the  literals),  the  formula  is  essentially  just  a  linear  polyno¬ 
mial  in  several  variables  with  natural  number  coefficients.  Using  this  fact  it  is  easy  to  imagine  a 
notion  of  canonical  form  for  terms.  To  reduce  a  term  to  canonical  form,  simply  reassociate  and 
reorder  its  atomic  subterms  until  it  is,  say,  a  right-associated  sum  with  all  the  occurrences  of  each 
variable  appearing  consecutively  and  a  single  literal  at  the  end;  this  representation  is  canonical 
except  for  the  ordering  of  the  variables. 

For  the  purposes  of  this  proof,  I  have  found  that  the  notion  of  syntactic  reduction  to  canonical 
form  is  not  particularly  convenient.  Instead,  I  "factor  "  the  extraction  of  a  canonical  form  into  an 
interpretation  function  [•]  mapping  constraint  terms  to  linear  polynomials  (a  concept  I  will  make 
precise  shortly)  and  a  representation  function  1Z  in  the  other  direction.  The  linear  polynomials  here 
are  objects  of  the  metatheory,  not  constraint  terms.  The  composition  7£[-]  can  be  viewed  as  the 
extraction  of  a  canonical  form  in  the  sense  that  (under  suitable  well-formedness  conditions),  if  A 
contains  no  hypotheses  of  the  form  ip  true  then  Ah  t  =  u  true  if  and  only  if  IZ\t\  and  7 £[it]  are 
the  same.  (Defining  the  representation  function  requires  assuming  that  the  set  Var  of  variables 
is  well-ordered.)  However,  the  linear  polynomial  [f]  is  a  more  convenient  object  to  reason  about 
than  the  term  R\t\:  the  former  lives  in  a  meta theoretic  structure  with  an  addition  operator  that 
is  commutative  and  associative,  while  the  latter  lives  in  a  syntactic  theory  with  a  formal  addition 
operator  that  is  commutative  and  associative  up  to  provable  equality. 

The  second  insight  is  that  treating  terms  as  polynomials  has  the  effect  of  trivializing  most  of 
the  axioms  in  the  logic.  I  devise  an  interpretation  for  formulas  that  trivializes  several  other  rules  in 
the  same  way.  The  rules  making  provable  equality  an  equivalence  relation  and  <  a  partial  order 
become  trivial,  and  importantly,  so  does  the  rule  allowing  cancellation  of  a  subterm  appearing 
on  both  sides  of  an  inequality.  Under  this  interpretation,  the  only  rules  that  remain  nontrivial 
are  the  hypothesis  rule,  the  nonnegativity  rule  (giving  0  <  t  for  any  t)  and  the  monotonicity 
rule  (allowing  inequality  formulas  to  be  "added"  as  in  high  school  algebra)  —  that  is,  a  proof 
simply  adds  together  several  hypotheses  along  with  one  instance  of  the  nonnegativity  axiom.  The 
possibility  of  doing  this  for  any  particular  goal  formula  and  set  of  hypotheses  is  easily  formulated 
as  an  integer  program. 

4.2.2  Interpretation  of  Terms 

Syntactically,  the  terms  of  the  constraint  logic  contain  both  the  natural  numbers  and  the  constraint 
term  variables  and  are  closed  under  formal  addition.  Viewed  modulo  provable  equality,  formal 
addition  is  commutative  and  associative,  agrees  with  integer  addition  on  integer  arguments,  and 
has  zero  as  an  identity.  My  first  step  in  showing  that  the  provable  equality  and  provable  inequality 
relations  are  decidable  is  to  give  an  interpretation  of  constraint  terms  into  a  structure  with  a  com¬ 
putable  addition  operation  that  has  these  properties  up  to  equality  rather  than  only  up  to  provable 
equality. 
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Definition  4.2  A  linear  polynomial  is  a  function  P  :  Var  U  {1}  — >  Z  that  is  zero  at  all  but  finitely  many 
points.  We  call  the  set  of  all  such  functions  Poly.  If  P(x)  >  0  for  all  x,  zve  say  P  is  nonnegative;  Poly+ 
is  set  of  all  nonnegative  linear  polynomials.  The  letters  F  and  G  zvill  be  understood  to  range  over  Poly+. 

As  a  matter  of  notation,  if  P(  1)  =  mo  and  P(a,i)  =  nrii  for  1  <  i  <  n  and  P(b )  =  0  for  all  b  f 
{ai, . . . ,  an},  then  zve  zvrite  P  as 

m0  +  midi  4 - b  mnan 

optionally  omitting  mo  if  it  is  zero.  Note  that  the  order  of  the  terms  in  such  a  rendering  is  insignificant. 


The  sum  P  +  Q  is  defined  as  the  pointwise  sum  of  the  functions  P  and  Q;  this  gives  the  usual 
meaning  of  polynomial  addition.  Subtraction  and  scalar  multiplication  are  defined  analogously.  I 
will  also  need  the  pointwise  meet  operation  n  and  the  "bounded  subtraction"  operation  defined 
as  follows: 


(PnQ)(x)  =  min  (P(x),Q(x)) 

(P  ©  Q)(x)  =  P{x)  -  mm(P(x),Q(x)) 

_  (  P{x)  -  Q(x )  ifP(x)  >  Q(x) 
\  0  ifQ(x )  >  P(x) 


It  will  be  important  that  for  any  polynomials  P  and  Q,  P  =  { P  GQ)  +  (P  n  Q). 

If  F  is  a  nonnegative  linear  polynomial,  we  say  that  A  h  F  if  for  variables  a,  F(a)  f  0  implies 
a  €  A.  Clearly,  if  A  b  F  and  A  b  G  then  A  h  F  +  G;  furthermore,  if  A  b  F  +  G  then  A  b  F  and 
AbG. 


Definition  4.3  The  interpretation  [•]  :  Term  — >  Poly+  of  terms  as  nonnegative  linear  polynomials  is 
defined  as  follows: 

[n]  =n  [a]  =  a  [i  +  uj  =  [f]  +  {uj 

Observe  that  Poly+  contains  convenient  subsets  corresponding  to  the  variables  and  the  natural 
numbers,  and  that  it  forms  a  commutative  monoid  whose  unit  is  [0] . 

Let  A  denote  the  pointwise  partial  ordering  on  polynomials: 

P  A  Q  iff  for  all  x  G  Var  U  {1},  P(x)  <  Q(x) 

We  will  say  P  -<  Q  when  P  Z  Q  and  P  f  Q.  When  restricted  to  nonnegative  polynomials,  -< 
is  well-founded.  Assume  the  set  Var  of  variables  is  also  well-ordered  by  some  relation  C.  The 
resulting  induction  principles  ensure  that  the  following  representation  function  is  well-defined. 

Definition  4.4  The  representation  TZ  :  Poly+  — >  Term  of  nonnegative  linear  polynomials  as  terms  is 
defined  asfollozvs: 

•  For  constants  m  E  Z,  lZ(m)  =  m. 

•  For  non-constant  polynomials  F,  1Z(F)  =  a  +  IZ(F  —  a)  zvhere  a  =  min{x  G  Var  |  F(x)  f  0}. 

The  minimum  is  with  respect  to  c.  Note  that  this  is  a  valid  inductive  definition,  since  in  the  second  case 
F  —  a  is  nonnegative  and  F  —  a  -<  F. 
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The  representation  of  polynomials  as  terms  is  a  right  inverse  of  the  interpretation  of  terms  as 
polynomials,  and  a  left  inverse  up  to  some  syntactic  manipulation. 

Lemma  4.3  For  any  nonnegative  polynomial  F,  =  F. 

Proof:  By  induction  on  F. 

Lemma  4.4  If  A  h  F  and  Ah  G,  then  A  h  K(F  +  G)  =  U(F)  +  11(G)  true. 

Proof:  By  induction  on  F. 

Lemma  4.5  If  A  L  t  :  N,  then  A  h  R\t\  =  t  true.  It  follows  that  for  well-formed  terms  t  and  u,  if 
[f]  =  [«]  then  Ah  t  =  u  true. 

Proof:  By  induction  on  (the  kinding  of)  t,  using  Lemma  4.4. 

As  an  aside,  note  that  the  ordering  on  Var  is  necessary  only  to  make  the  choice  of  a  in  the 
second  part  of  the  definition  of  1Z  unique;  one  could  do  without  it  entirely  by  defining  'R,  as  a  one- 
to-many  relation  rather  than  a  function.  This  would  make  the  proofs  of  these  last  three  lemmas 
somewhat  awkward,  but  would  render  the  canonical  form  7 £[t]  insensitive  to  the  names  of  the 
free  variables  in  t.  Since  simple  formulas  contain  no  bound  variables,  I  consider  sacrificing  this 
insensitivity  a  reasonable  tradeoff. 

4.2.3  Interpretation  of  Formulas 

Next,  I  define  an  interpretation  of  formulas  in  the  constraint  logic  as  constraints  on  linear  polyno¬ 
mials. 

Definition  4.5  A  polynomial  constraint  is  an  assertion  of  the  form  P  =  0  or  P  <  0,  where  P  is  a  linear 
polynomial  (not  necessarily  nonnegative). 

Definition  4.6  The  interpretation  [•]  :  Form  — >  PConstr  of  formulas  as  polynomial  constraints  is  defined 
as  follows: 

[fi  <  t2j  =  ([ill  -  1^1  <  0)  [fi  =  t2J  =  ([ill  -  I^]  =  0) 

Because  of  the  reflexivity  and  antisymmetry  rules  for  inequality  in  the  constraint  logic,  the 
derivability  of  any  equality  formula  f i  =  f2  is  equivalent  to  the  derivability  of  both  inequalities 
t\  <t-2  and  t-2  <  fi;  furthermore,  the  exact  same  formulas  can  be  derived  in  a  context  containing  an 
equality  hypothesis  as  in  the  context  that  contains  inequality  in  both  directions  instead.  Thus,  as  I 
will  show,  it  suffices  to  restrict  our  attention  to  inequality  formulas.  First,  though,  I  must  define  a 
representation  function  mapping  polynomial  constraints  to  constraint  formulas,  analogous  to  the 
one  for  terms.  The  first  step  in  defining  this  mapping  is  to  show  how  to  decompose  the  polynomial 
on  the  left  side  of  a  polynomial  constraint  into  the  two  nonnegative  polynomials  representing  the 
terms  on  both  sides  of  an  inequality  formula. 

Definition  4.7  If  P  is  a  linear  polynomial,  then  its  left-hand  part  Pl  and  right-hand  part  Pr  are  defined 
as  follows: 

p  /  _  /  P(x)  ifP(x)  >0  ,  .  _  J  -P(x)  if  P(x)  <  0 

L  x  |  0  otherwise  R  X  \  0  otherwise 

That  is,  Pl  consists  of  the  terms  of  P  with  positive  coefficients,  and  Pr  consists  of  the  terms  of  P  with 
negative  coefficients. 
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The  decomposition  into  left-hand  and  right-hand  parts  is  unique,  in  a  sense  made  precise  by 
the  following  lemma. 

Lemma  4.6  Let  P,  and  Pf  be  linear  polynomials.  Then  Pf  =  Pr  and  Pf  =  Pr  iff  all  three  of  the 
following  are  true: 

1.  PI  and  Pf  are  both  nonnegative; 

2.  P  =  P*L-  P*R;  and 

3.  Tor  any  x  €  Var  U  {1},  either  Pf(x)  =  0  or  Pf  (x)  =  0. 

Proof: 

(=>):  That  conditions  (1)  and  (2)  hold  for  Pr  and  Pr  is  obvious.  For  condition  (3),  suppose 
x  €  Var  U  {1}.  If  P(x)  <  0,  then  by  definition  Pr(x)  =  0;  if  P(x )  >  0,  then  by  definition  Pr{x)  =  0. 

(4=):  Assume  conditions  (l)-(3)  hold.  I  need  to  show  that  Pf\  =  Pl  and  P*R  =  Pr.  So,  suppose 
x  G  Var  U  {1}.  Condition  (3)  gives  two  cases: 

Case:  Pf(x)  =  0.  By  condition  (2),  P(x)  =  —Pf(x).  By  condition  (1),  this  means  P(x)  <  0. 
Hence  Pl(x)  =  0  =  Pf(x)  and  Pr{x)  =  —P(x)  =  Pr(x). 

Case:  Pr{x)  =  0.  By  condition  (2),  P(x)  =  Pf(x).  By  condition  (2),  then,  P(x)  >  0.  Hence 
PL(x )  =  P{x)  =  Pjfx)  and  Pr(x)  =  0  =  Pf{x). 

End  of  Proof. 

Henceforth  I  will  gloss  the  third,  somewhat  awkward,  condition  by  saying  that  Pr  and  Pr 
have  "disjoint  domains". 

Definition  4.8  The  syntactic  representation  IZ  :  PConst  — >  Form  of  polynomial  constraints  as  formulas 
is  defined  by: 

n{p  <  0)  =  (U(PL)  <  i z{pR)) 

Lemma  4.7  Tor  any  polynomial  P,  \R(P  <  0)]  =  (P  <  0). 

Proof:  Direct,  using  Lemma  4.3: 

mp  <  0)1  =  [  n(pL)  <  n(pR)j  =  (in(pL)j  -  mpR)  ]  <  0)  =  (pL -pr<o)  =  (p<o) 

End  of  Proof. 

Lemma  4.8  If  A  F  t  :  N  and  A  F  u  :  N,  then  t  <u  and  P(\t  <  it])  are  interderivable  in  context  A.  That 
is, 


1.  A,  (t  <  u)  F  TZ[\t  <  it])  true  and 

2.  A,  72([t  <  it])  F  t  <  u  true. 

Proof: 

Let  F  =  [f]  and  G  =  [it].  Then  TZ{\t  <  it])  =  1Z{F  —  G  <  0)  =  ((F  —  G)l  <  (F  —  G)r.  Observe 
that 

F  -  G  =  ((F  n  G)  +  (F  ©  G))  -  ((F  nG)  +  (GQ  F))  =  (FqG)~(GQF ) 
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But  FQG  and  GQF  are  both  nonnegative  and  have  disjoint  domains,  so  it  follows  that  ( F  —  G)l  = 
F  ©  G  and  (F  -  G)R  =  G  ©  F.  Hence  7^([f  <  uj)  =  (IZ{F  ©  G)  <  7Z(G  ©  F)). 

So,  to  prove  part  (1),  let  A '  =  (A,  t  <  u ). 

By  the  hypothesis  rule.  A'  b  t  <  u  true. 

Observe  that  [t]  =  F  =  (F  n  G)  +  (F  ©  G)  and  similarly  [it]  =  (F  n  G)  +  (G  ©  F). 

By  Lemma  4.4,  A'  b  IZ(FnG)  +  K(F QG)  =  72[t]  true  and  A'  b  7l\u}  =  7Z(F n G)  +  K{FeG)  true. 
Applying  Lemma  4.3,  A'  b  7Z(F  n  G)  +  7 Z(F  ©  G)  =  t  true  and  A'bu  =  7 Z(F  nG)  +  7Z(F  ©  G)  true. 
By  reflexivity  and  transitivity.  A'  b  7 Z(F  nG)  +  7 Z(F  ©  G)  <  7 Z(F  blG)  +  7 Z(G  ©  F)  true. 

By  the  cancellation  rule.  A'  b  7 Z(F  ©  G)  <  7 Z(F  n  G)  true. 

That  is.  A1  b  7Z(\t  <  it]  true. 

To  prove  part  (2),  let  A"  =  (A,  7 Z(ft  <  «])). 

By  the  hypothesis  rule,  A "  b  7Z(\t  <  it])  true. 

That  is.  A"  b  7 Z(F  ©  G)  <  7 Z{G  ©  F)  true. 

Clearly  A"  b  7 Z(F  n  G)  :  N;  therefore  by  the  reflexivity  rules  A"  b  7 Z(F  nG)<  7 Z(F  n  G)  true. 

By  the  monotonicity  rule,  A "  b  7 Z(F  nG)  +  7 Z(F  ©  G)  <  7 Z(F  nG)  +  7 Z(G  ©  F)  true. 

Reasoning  as  in  part  (1),  A"  \~  t  =  7Z(FnG)+7Z(FQG)  true  and  A"  b  7Z{FnG)+7Z(GQF)  =  it  true. 
By  reflexivity  and  transitivity.  A"  b  t  <  u  true. 

End  of  Proof. 

Lemma  4.9  (Addition  of  Provable  Constraints)  If  A  b  7 Z(P  <  0)  true  and  A  b  7 Z(Q  <  0)  true  then 
A  b  7 Z(P  +  Q  <  0)  true. 

Proof: 

By  assumption,  A  b  7 Z(Pl)  <  7Z(Pr)  true  and  A  b  7Z(Ql)  <  7 Z(Qr)  true. 

By  the  monotonicity  rule,  A  b  7Z{Pl)  +  7 Z(Ql)  <  7 Z(Pr)  +  'R-(Qn)  true. 

Using  Lemma  4.4  and  the  reflexivity  and  transitivity  rules,  A  h  7 Z(Pl  +  Ql)  <  7 Z(Pr  +  Qr )  true. 
By  Lemma  4.8  and  Proposition  4.1,  A  b  7Z\R{Pl  +  Ql)  <  7 Z(Pr  +  Qr)]  true. 

By  definition,  [7 Z(PL  +  QL)  <  7 Z(PR  +  Qr)]  =  ([7 Z(PL  +  Ql)}  ~  \n{PR  +  Qr)]  <  0). 

Applying  Lemma  4.3,  (pl(PL  +  Ql)]  -  [7 Z{PR  +  Qr)]  <  0)  =  ((PL  +  QL)  -  (. PR  +  Qr)  <  0)  = 
((Pl  ~  Pr )  +  (Ql  -  Qr)  <  0)  =  (P  +  Q  <  0)  =  (S  <  0). 

Thus  I  have  A  b  7 Z(S  <  0)  true,  as  desired. 

End  of  Proof. 


4.2.4  Semantic  Proofs 

To  finish  setting  up  the  proof  of  decidability,  I  now  define  a  notion  of  "proof"  for  polynomial 
constraints.  After  proving  that  the  syntactic  provable  inequality  relation  of  Talt-R  and  this  new 
semantic  provable  inequality  relation  correspond,  I  will  show  that  the  existence  of  a  semantic 
proof  for  a  given  constraint  is  decidable. 

Definition  4.9  The  interpretation  of  contexts  as  sets  of  polynomial  constraints  is  defined  as  follows: 

[A]  =  {[ti  <  t2]  I  {h  <  t2)  €  A}u 
{[Q  <  Q]  |  (U  =  Q)  €  A}u 
{ |^2  <  il]  |  (tl  =  if)  G  A} 
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Definition  4.10  A  semantic  proof  M  =  ( A ,  F)  consists  of  a  finite  multiset  A  of  linear  polynomials  and 
a  nonnegative  linear  polynomial  F.  The  yield  of  M  (written  ff  M )  is  defined  as 

Re  A 

We  say  that  M  is  a  semantic  proof  of  P  <  0  in  context  A  (written  A  |  =  M  :  P  <0)  if  ffM  =  P  and,  for 
every  R  e  A,  (R  <  0)  G  [A]. 

It  is  easy  to  show  that  every  semantic  proof  corresponds  to  a  syntactic  proof.  First,  I  prove 
a  lemma  showing  that  each  constraint  in  the  interpretation  of  a  context  is  syntactically  provable, 
then  I  prove  the  main  soundness  lemma. 

Lemma  4.10  If  (P  <  0)  G  [A],  then  A  b  7 Z(P  <  0)  true. 

Proof: 

There  are  three  cases. 

Case  1:  (P  <  0)  =  [f  <  it]  and  (t  <  u)  G  A.  By  the  hypothesis  rule.  Ah  t  <  u  true.  By 
Lemma  4.8  and  Proposition  4.1,  A  b  lZ\t  <  it]  true,  that  is,  A  b  7 Z(P  <  0)  true. 

Case  2:  (P  <  0)  =  [7  <  it]  and  (t  =  it)  €  A.  By  the  hypothesis  rule,  A  b  t  =  u  true.  By  the 
reflexivity  rule,  A  b  t  <  u  true.  The  result  follows  as  in  part  (1). 

Case  3:  (P  <  0)  =  [it  <  tj  and  (t  =  u)  €  A.  By  the  hypothesis  rule,  A  b  t  =  it  true.  By  the 
symmetry  rule  for  equality,  Abu  =  t  true.  By  the  reflexivity  rule  for  <,  A  b  u  <  t  true.  The 
result  follows  as  in  part  (1). 

End  of  Proof. 

Lemma  4.11  (Soundness) 

1.  If  A  |=  M  :  [t  <  it],  then  Ah  t  <u  true. 

2.  If  A  J=  M  :  ft  <  n]  and  A  \=  M'  :  [n  <  tj  then  A  b  t  =  u  true. 

Proof:  The  proof  of  part  (2)  follows  from  part  (1)  by  antisymmetry. 

For  part  (1),  let  M  =  (A,  F)  and  P  =  JfM  =  A  —  F.  I  need  to  show  A  b  t  <  u  true;  by 
Lemma  4.8  and  Proposition  4.1  it  suffices  to  show  that  A  b  lZ\t  <  it]  true,  that  is,  that  A  b  1Z(P  < 
0)  true. 

To  begin,  notice  that  since  F  is  nonnegative,  ( —F)l  =  0  and  (-F)r  =  F.  By  a  proof  rule, 
A  b  0  <  72(F)  true,  i.e.,  A  b  72((— F)  <  0)  true. 

Also  note  that  for  every  R  G  A,  (R  <  0)  G  [A]  and  thus  A  h  72(7?  <  0)  true  by  Lemma  4.10. 
Since  A  is  finite,  repeated  application  of  Lemma  4.9  gives  A  b  'Ri  ff  A  <  0)  true.  Using  Lemma  4.9 
once  more,  A  b  72(^  A  —  F  <  0)  true,  i.e.,  A  b  72(P  <  0)  true. 

End  of  Proof. 

Showing  that  anything  syntactically  provable  is  semantically  provable  is  a  bit  more  interesting. 

Lemma  4.12  If  A  |=  M  :  P  <  0  and  A  |=  M'  :  Q  <  0  then  there  exists  an  N  giving  A  | =  N  :  (P+Q)  < 
0. 
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Proof: 

Suppose  M  =  (A.  F )  and  M'  =  (A',  F').  Then  let  N  =  (A  l±l  A' ,  F  +  F'),  and  observe  that 
YJN  =  YJ{AttA')-(F-F')  =  YJA  +  Y^A'-{F-F')  =  (YJA-F)  +  {,YJA'-F')  =P  +  Q. 

Thus  by  definition  A  |=  M  :  (P  +  Q)  <  0. 

End  of  Proof. 

Lemma  4.13  (Completeness) 

.  If  A  b  t  <  u  true  then  there  is  an  M  such  that  A  |=  M  :  [t  <  it], 

•  If  A  b  t  =  u  true  then  there  is  an  M  such  that  A  | =  M  :  [t  <  it]  and  an  M'  such  that  A  |=  M'  : 
\u  <  t]. 

Proof: 

I  will  prove  both  parts  simultaneously  by  induction  on  derivations. 

|  jiep* 

((p  true)  G  A) 

A  E  <p  true 

Sub-case:  =  (t  <  u).  Then  ft  <  u]  G  [A]  and  so  A  |=  ({[t  <  it]},  0)  :  [t  <  it]. 

Sub-case:  ip  =  (t  =  it).  Then  [i  <  it]  and  [u  <  f]  are  both  in  [A],  Thus  A  |=  ({[f  <  it] },  0)  : 
\t  <  it]  and  A  |=  ({[it  <  tj},  0)  :  [it  <  tj. 


Case: 

A  h  t2  =  t\  true  A  h  t\  =  t%  true  A  b  t\  <  t2  true  A  h  ti  <  t\  true 
A  h  t\  =  t2  true  A  h  ti  <  t2  true  A  b  t\  =  (2  true 

Each  of  these  cases  follows  immediately  from  the  induction  hypothesis. 


Case: 


_  A  h  t  :  N  A  h  tj_  :  N  A  E  t2  :  N 

Ahm  +  n  =  m  +  n  true  A  h  0  +  t  =  t  true  A  h  t\  +  £2  =  ^2  +  f  1  true 

A  h  t  :  N  A  htj  :  N  (fori  =  1,2,3) 

A  h  t  =  t  true  A  h  (fi  +  tf)  +  (3  =  fi  +  ((2  +  (3)  true 


In  each  of  these  rules  the  formula  being  proved  has  the  form  t  =  it  where  [f ]  =  [it] . 

Thus  in  each  case,  [f  <  it]  =  [it  <  t]  =  (0  <  0),  and  so  (0,  0)  is  a  canonical  proof  of  either  direction. 


Case: 


A  b  t\  =  t\  true  A\~  t2  =  t'-2  true 
A  E  t\  +  t2  =  +  tl2  true 


By  the  induction  hypothesis,  [ti  <  t\  ]  and  [(2  <  t'2}  have  semantic  proofs  in  A. 
That  is,  there  exist  semantic  proofs  of  ([ti]  —  {if}  <  0)  and  ( [^2]  —  [i^I  —  0)- 
By  Lemma  4.12,  ({tij  —  +  [12]  —  [i^I  ^  0)  is  semantically  provable. 
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But  this  constraint  is  the  same  as  ([f3  +  12]  —  [fi  +  f2l  ^  0),  that  is,  [t3  +  i2  <  t\  +  t2J. 
The  other  direction  is  the  same. 


Case: 


A  h  t\  =  f3  true  A  b  i3  =  t2  true 
A  h  ti  =  t-2  true 


I  will  show  that  [i3  <  12!  is  semantically  provable;  the  other  direction  is  similar. 

By  the  induction  hypothesis,  there  exist  semantic  proofs  of  [fi  <  t3]  and  [t3  <  f2]/  that  is,  ([ti]  — 
[*3]  <  0)  and  ([t3 1  -  [t2]  <  0). 

By  Lemma  4.12,  there  is  a  semantic  proof  of  ([ti]  —  [f3]  +  [f3]  —  [t2]  <  0),  that  is,  of  ([fi]  —  [f2]  <  0), 
which  in  turn  is  the  same  as  [f  1  <t2l 


Case: 


A  h  fi  <  t3  true  A  b  f3  <  t2  true 
A  h  t\  <  f2  true 


By  the  induction  hypothesis,  there  exist  semantic  proofs  of  [fi  <  f3J  and  [f3  <  t2]/  that  is,  of 
([fil  -  [*3]  <  0)  and  ([f3]  -  [i2]  <  0). 

By  Lemma  4.12,  ([ii]  —  [t3]  +  [t3]  —  [12]  <  0)  is  semantically  provable. 

But  [ti]  -  [t3]  +  [t3]  -  [t2]  =  [ii]  -  [£2],  so  I  have  found  a  semantic  proof  of  [f3  <  t2J- 


Case: 


(to  <  n) 

A  b  to  <  n  true 


Note  that  [to.  <  n]  =  (to  —  n  <  0).  Since  m  <  n,  (n  —  m)  is  a  nonnegative  linear  polynomial;  hence 
A  |=  (0,  n  —  to)  :  [to  <  n]. 


Case: 


A  b  fi  <  t\  true  A  \~  t2  <  t'2  true 
A  L  t\  +  t2  <  +  t2  true 


By  the  induction  hypothesis,  [f  1  <  t\  |  and  [f2  <  tf2 1  are  semantically  provable. 

That  is,  there  exist  semantic  proofs  of  ([f3]  —  [f( J  <  0)  and  ( [^2 J  —  [f2J  ^  0). 

By  Lemma  4.12,  ([fi]  —  [t)  ]  +  [*2]  —  [t^l  A  0)  is  semantically  provable. 

But  this  constraint  is  the  same  as  ([f3  +  12]  —  [t[  +  t'2 J  <  0),  that  is,  [f3  +  i2  <  t\  +  t2j. 


Case: 

A  b  f  +  fi  <  f  +  i2  true 
A  h  t\  <  t2  true 

By  the  induction  hypothesis,  [t  +  1 1  <t  +  <2]  is  semantically  provable. 

By  the  definitions  of  the  interpretation  functions  [■]  for  terms  and  formulas, 

[t  +  ti  <  t  +  f2]  =  ([f]  +  Ihj  -  [t]  -  [t2]  <  0)  =  ([ti]  -  [t2]  <  0)  =  [fi  <  t2j) 


Thus  [fi  <  f2]  is  semantically  provable. 


Case: 


AH  :  N 
A  b  0  <  t  true 

Observe  that  [0  <  tj  =  (— [t]  <  0);  thus  A  |=  (0,  [t])  :  [0  <  tj. 

End  of  Proof. 
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4.2.5  The  Decidability  Theorem 

The  interpretation  of  terms  as  polynomials  effectively  gives  a  representation  of  terms  that  iden¬ 
tifies  those  that  differ  only  by  rearrangement  of  terms  and  computation  with  numerals.  The  in¬ 
terpretation  of  formulas  as  polynomial  constraints  identifies  those  formulas  that  differ  only  by 
rearrangement  of  terms,  computation  with  numerals,  and  cancellation.  The  only  proof  rules  not 
covered  are  the  hypothesis  rule,  the  rules  for  adding  equations  or  inequalities  together,  and  the 
one  stating  that  any  term  is  greater  than  or  equal  to  zero;  thus  the  semantic  content  of  a  proof 
essentially  consists  of  a  collection  of  hypotheses  to  be  added  together,  along  with  an  axiomatically 
true  formula  of  the  form  0  <  t.  Intuitively  speaking,  therefore,  I  have  reduced  the  question  of 
whether  a  formula  is  provable  or  not  to  a  question  of  whether  a  certain  polynomial  is  a  linear 
combination  of  certain  others  or  not.  This  latter  question  is  clearly  decidable. 

Theorem  4.1  (Decidability)  The  (derivability  of  the)  judgment  A  b  ip  true  is  decidable. 

Proof: 

If  (p  =  (t  =  u),  then  it  suffices  to  decide  whether  both  A  I -  t  <  u  true  and  Ahu<(  true.  If  both 
of  these  hold,  then  so  does  A  b  <p  true;  if  either  does  not  hold,  then  neither  does  the  judgment  of 
interest. 

If  (p  =  (t  <  u ),  then  suppose  ft  <  uj  =  [P  <  0);  by  Lemmas  4.11  and  4.13  it  suffices  to  decide 
whether  there  exists  a  semantic  proof  M  such  that  A  |=  M  :  P  <  0.  Suppose  [A]  =  { II  \  < 
0,  •  •  • ,  Hn  <  0};  then  I  claim  that  (P  <  0)  is  semantically  provable  iff  there  are  natural  numbers 
xi, ...  ,xn  satisfying  the  system  of  constraints: 


Hi{l)xi  +  •  •  •  +  Hn{ l)xn  >  P{  1) 

Hi{ai)xi  +  •  •  •  +  Hn{ai)xn  >  P(ai) 

H1(am)xi  +  •  •  •  +  Hn{am)xn  >  P(am ) 

where 

n 

{ai, . . . ,  cim}  =  dom(P)  U  (J  dom(fp) 

i= 1 

(i.e.,  ai, . . . ,  am  are  all  the  variables  appearing  in  the  judgment  to  be  decided).  Therefore,  to  decide 
whether  A  h  t  <  u  true  is  provable  it  suffices  to  generate  this  system  of  constraints  and  solve 
it  using  an  algorithm  for  Integer  Programming.  Generating  the  constraints  poses  no  difficulty, 
since  the  operations  on  polynomials  required  to  extract  P  from  t  and  u,  and  the  Hf  s  from  A,  are 
certainly  computable. 

Now  I  must  prove  both  directions  of  my  claim.  First,  suppose  that  M  =  (A.  F )  is  a  semantic 
proof  of  (. P  <  0).  Then  each  polynomial  It  in  the  multiset  A  is  equal  to  Ht  for  some  i.  So,  for 
1  <  *  <  n,  let  Xi  be  the  multiplicity  of  H ,  in  A;  then 

P  =  ^M  =  J2A-F  =  xi#i  +  •  •  •  +  xnHn  -  F 

Since  F  is  nonnegative,  x  i . . . . ,  xn  satisfy  the  constraints  above. 

Now,  suppose  that  x\, . . .  ,xn  satisfy  the  constraints.  To  produce  a  semantic  proof  of  P,  let 
A  be  the  submultiset  of  {H\. . . . , Hn}  containing  each  H,  with  multiplicity  x,  and  define  F(x)  = 
d)(:c)  —  P(x)  for  every  x  G  Var  U  {1}.  Then  clearly  A  —  F  =  P,  so  it  remains  only  to  show 
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that  F  is  nonnegative.  Clearly  F(z)  =  0  for  any  z  G  Var  \  {ai, . . . ,  am}.  On  the  other  hand,  if 
z  G  {l,ai, . . .  ,am}  then  one  of  the  constraints  gives  (ff  A)(z)  >  P(z)  and  so  F(z)  >  0.  Thus 
M  =  ( A ,  F)  is  a  semantic  proof  and  Ah  M  :  ft  <  u|. 

End  of  Proof. 

4.2.6  Implementation 

I  have  not  implemented  this  Integer  Programming-based  decision  procedure  in  the  TALT-R  as¬ 
sembler.  Instead,  I  implemented  a  much  simpler  algorithm  that  decides  a  proper  subset  of  the 
derivable  judgments;  this  subset  is  sufficiently  large  that  any  program  generated  by  my  compiler 
will  be  accepted  by  the  algorithm. 

The  algorithm,  which  I  call  depth-limited  semantic  proof  search,  is  based  on  a  size  measure  for 
semantic  proofs  that  I  call  depth.  Roughly  speaking,  the  depth  of  a  semantic  proof  corresponds  to 
the  number  of  uses  of  the  hypothesis  rule  in  a  syntactic  derivation. 

Definition  4.11  (Depth)  If  M  =  (A,  F)  is  a  semantic  proof,  then  the  depth  of  M  is  the  cardinality  of 
the  multiset  A.  If  the  constraint  P  <  0  is  semantically  provable,  then  the  depth  of  (. P  <  0)  (relative  to  a 
context  A)  is  the  minimum  depth  of  any  semantic  proof  of  P  (in  A);  otherwise  we  say  ( P  <  0)  has  depth 
oo.  The  depth  of  an  inequality  formula  t  <uis  the  depth  of\t  <  uj;  the  depth  of  an  equality  formula  t  =  u 
is  the  maximum  of  the  depth  oft<u  and  the  depth  of  u  <t. 

Clearly,  formulas  of  any  depth  are  provable;  thus  for  any  d, 

DLPd  =  {(A,  ip)  |  <p  has  depth  at  most  d  relative  to  A}  C  {(A,  <p)  |  A  I-  <p  true}. 

The  depth-limited  semantic  proof  search  algorithm  with  limit  d  decides  the  set  DLPrf:  Given  a  con¬ 
text  A  and  a  formula  ip  =  (t  <  u),  it  enumerates  all  possible  submultisets  of  [A]  of  cardinal¬ 
ity  d  or  smaller  (there  are  finitely  many).  For  each  such  multiset  A,  it  computes  the  difference 
F  =  A  —  P.  If  F  is  nonnegative,  then  A  |=  (A,  F)  :  [<p]  and  so  the  depth  of  <p  is  at  most  \A\, 
which  is  at  most  d;  thus  (A,  <p)  G  DTP,/  and  the  algorithm  returns  success.  If  no  A  makes  A  —  P 
nonnegative,  the  algorithm  returns  failure. 

When  I  discuss  type  preservation  for  my  translation  into  Talt-R,  I  will  claim  that  there  is 
a  d  such  that  the  typing  of  the  translation's  output  never  depends  on  the  truth  of  any  formula 
with  depth  greater  than  d.  This  implies  that  depth-limited  semantic  proof  search  with  limit  d  is 
"complete  enough"  in  the  sense  I  described  at  the  start  of  this  chapter. 

4.3  Incompleteness 

Since  the  rules  of  the  Talt-R  constraint  logic  do  not  include  a  schema  for  induction  over  natural 
numbers,  it  comes  as  no  suprise  that  they  are  not  complete  in  the  sense  that  not  every  entailment 
Ah  ip  true  that  is  valid  over  the  natural  numbers  can  be  derived  using  them.  In  fact,  all  of  the  rules 
of  the  constraint  theory  remain  sound  if  the  variables  are  allowed  to  range  over  all  nonnegative 
rationals;  thus  any  entailment  that  is  valid  over  N  but  not  over  Q-°  cannot  possibly  be  derivable. 
For  instance,  (a  +  a<3)l/a<l  true  because  this  judgment  is  not  valid  if  a  is  allowed  to  take  on 
non-integral  values. 

The  next  natural  question  is  whether  the  TALT-R  constraint  logic  can  derive  all  formulas  that 
are  valid  over  the  nonnegative  rationals.  The  answer  is  no:  for  a  counterexample,  observe  that 
(a  +  a<4)l/a<2  true.  An  extension  of  the  Talt-R  constraint  logic  capable  of  deriving  this 
judgment  and  others  like  it  is  described  in  Appendix  C. 
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4.4  Chapter  Summary 

The  clock  reasoning  in  Talt-R,  as  well  as  the  static  semantics  of  guarded  and  singleton  types, 
depends  on  a  constraint  logic  whose  soundness  must  be  provable  in  Twelf.  I  have  isolated  an 
impoverished  but  useful  theory  of  linear  inequalities  whose  soundness  is  very  straightforward, 
and  proven  that  it  is  decidable  (although  I  have  not  found  an  efficient  decision  procedure)  and 
contains  a  convenient  subset  for  which  a  simple  decision  procedure  is  easily  designed. 


Chapter  5 

Lilt:  A  Low-Level  Source  Language 


lilt  \lilt\  (n)  1  :  a  spirited  and  usually  cheerful  song  or  tune  2  :  a  rhythmical  swing, 
flow,  or  cadence  3  :  a  springy  buoyant  movement  [44] 

So  that  I  can  formalize  the  process  of  resource-bound  certifying  compilation,  this  chapter 
presents  a  low-level  typed  language  that  will  serve  as  the  source  of  a  translation  into  MiniTALT-R. 
I  call  this  language  Lilt,1  and  it  serves  as  the  intermediate  language  in  a  certifying  compiler  for 
a  subset  of  the  high-level  language  Popcorn.  (Popcorn  is  best  known  as  the  source  language  de¬ 
signed  for  compilation  into  TALx86  [45].)  The  back-end  of  my  compiler  generates  EXTALT-R  from 
Lilt  using  a  translation  based  on  the  Lilt-to-MiniTALT-R  translation  in  Chapter  7. 

Lilt  is  designed  to  be  completely  ignorant  of  timing  issues,  but  it  does  have  a  number  of  un¬ 
usual  characteristics  motivated  by  its  intended  use  in  a  compiler  for  Popcorn.  Specifically,  func¬ 
tions  in  a  Popcorn  program  usually  declare  mutable  local  variables  which  they  read  from  and 
assign  to  frequently.  Furthermore,  Popcorn  functions  often  contain  loops  and  sometimes  contain 
exception  handling  constructs,  and  it  is  essential  that  the  state  of  the  local  variables  be  threaded 
through  all  this  control  flow  with  a  minimum  of  work.  The  best  implementation  strategy  seems  to 
be  the  one  (presumably)  used  in  the  majority  of  compilers  for  C-like  languages,  and  described  in 
many  if  not  most  traditional  compiler  design  texts  (e.g.,  [48]):  Each  dynamic  instance  of  a  function 
allocates  (at  most)  one  stack  frame  in  which  to  store  its  local  variables,  and  register  allocation  is 
performed  on  (at  least)  an  entire  function  at  a  time  to  minimize  the  amount  of  "shuffling"  that 
must  be  performed.2  Unfortunately,  the  decision  to  adopt  this  compilation  model  complicates  the 
intermediate  language,  since  it  introduces  a  distinction  between  local  (mfraprocedural)  and  non¬ 
local  (in  te r  p  r  ocedur  al)  transfers  of  control,  and  requires  an  intermediate  language  that  can  deal 
with  mutable  local  variables. 

5.1  Syntax 

The  syntax  of  Lilt  is  given  in  Figure  5.1.  Lilt  has  three  different  syntactic  classes  of  identifiers  at  the 
term  level:  function  names  (ranged  over  by  /),  which  have  global  scope  and  stand  for  functions; 

1The  name  was  chosen  because  it  is  a  near-acronym  for  "Low-level  Intermediate  Language,"  rhymes  with  TILT,  is 
related  to  music  (like  most  ConCert  project  terminology)  and  has  implications  of  rhythm  and  liveliness,  which  is  sort 
of  like  liveness. 

2The  parenthetical  interjections  acknowledge  the  possibilities  of  eliding  the  stack  frame  on  an  architecture  with 
enough  registers,  and  of  performing  interprocedural  register  allocation,  respectively.  However,  our  target  architecture 
(IA-32)  has  few  registers  and  we  do  not  plan  to  implement  any  interprocedural  optimizations,  so  we  will  not  discuss 
these  matters  any  further. 
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Operands 

v  : 

:= 

s  \  n  tt  ff  |  *  |  /  |  q@v 

Coercions 

<7  : 

:  = 

id  |  [ci, . . , , cn\  rollr  unroll  pack[r,  c\, . . .  ,cn 

Small  Expressions 

r  : 

:  = 

v  |  op(v i, . . .  ,vn)  \  niV  \  injr (i, v)  |  outj(r) 

| 

(vi,  ■  ■  ■  ,vn)  |  {ni, . . .  ,vn} 

Conditions 

cond 

:  = 

Vl  =  V2  \v  i  <  V2 

Expressions 

e 

:  = 

return  v  raise  v  |  goto  i\c\, . . . ,  Cn] 

1 

let  s  =  r  in  e 

! 

let  s  =  v(vi, . . . ,  vm )  in  e 

1 

let  s  =  sub(r,  t>i)  in  e  let  sub(ni,  v2)  '■  =  v%  in  e 

1 

let  7 Tj  v  :=  v\  in  e 

1 

let  («i, . . . ,  an,  s)  =  unpack  v  in  e 

1 

pushhandler  £\c\, . . .  ,cn]  in  e  |  pophandler  in  e 

1 

if  cond  then  e\  else  e2 

1 

case  v  of  inj  (i,  s )  e\  else  e2 

Functions 

F 

;  = 

func(A;r;r).(enter(si, . . .  ,sn).e,£  1  =  Bi, . . .  ,£m 

=  Brn) 

Blocks 

B 

:  = 

block(A;  H;  T).e  hndl(A;  E;  T;  s).e 

Programs 

P 

:  = 

II 

7] 

II 

3 

Kinds 

k  : 

; 

T  |  h  -►  k2 

Type  Constructors 

C,T 

: 

a  int  bool  unit  (n, . . . ,  Tk)  [iy.Ti, ... ,  in:rn 

I  |  ns 

| 

t  array  |  (n, . . . ,  rm)  r  |  /xa.r 

| 

,  an:kn.r  3a\\k\, . . . ,  an:kn.T  Xa:k.c 

Cl  C2 

Type  Contexts 

A  : 

:  = 

■  A,  a:k 

Block  Types 

7  : 

:  = 

lbl(A;  S;  T)  |  /md(A;S;r) 

Local  Contexts 

r  : 

:  = 

[si  -fi 5  ■  ■  •  j  sn:Tn] 

Exception  Stack  Types 

S  : 

:  = 

•  |  s,r 

Label  Contexts 

A  : 

:  = 

£  1  -7l )  ■  ■  ■  j £n'Tn 

Function  Contexts 

$  : 

:  = 

fl'.T\ ,  .  .  .  ,  fn'-Tn 

Figure  5.1:  Lilt  Syntax 
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labels  (ranged  over  by  t),  which  stand  for  code  blocks  within  a  function  and  are  meaningful  only 
inside  that  function,  and  local  variable  names  (ranged  over  by  s),  which  also  have  function  scope. 
Local  variables  are  used  as  the  names  of  a  function's  arguments  as  well  as  the  names  of  local 
storage  locations  allocated  by  a  function. 

A  Lilt  program  is  a  sequence  of  mutually  recursive  function  definitions,  and  the  body  of  each 
function  consists  of  one  or  more  blocks.  The  first  block  in  each  function  is  a  special  entry  block 
of  the  form  enter(si, . . . ,  sn).e,  which  is  made  up  of  a  declaration  of  the  function's  local  variables 
and  the  expression  that  will  be  evaluated  when  the  function  is  called.  Each  of  the  remaining  zero 
or  more  blocks  in  the  function  body  is  either  an  ordinary  block  (block(A;  E;  r).e)  or  an  exception 
handler  (hndl(A;  E;  T;  s).e).  Corresponding  to  these  different  kinds  of  code  blocks  are  four  differ¬ 
ent  control-transfer  expression  forms,  namely  function  call,  function  return,  unconditional  jump 
and  raise. 

If  Vf  is  a  function  value,  the  function  call  expression  let  s  =  vj (v)  in  e  causes  control  to 
be  transferred  to  Vf's  entry  block,  binding  the  function's  formal  parameters  to  the  values  v.  If 
the  function  returns  a  value,  that  value  is  copied  into  the  local  variable  s  and  the  expression  e  is 
evaluated.  The  expression  return  v  immediately  exits  the  current  function  and  returns  the  value 
v  to  the  calling  function.  The  jump  expression  goto  t [c]  performs  a  one-way  transfer  of  control  to 
the  block  named  I,  passing  it  the  type  arguments  c  and  implicitly  passing  along  the  current  values 
of  the  current  function's  arguments  and  local  variables. 

The  expression  raise  v  is  similar  to  return  v  except  that  v  must  be  an  exception  value,  and  it 
is  passed  not  to  the  calling  function  but  to  the  current  exception  handler,  which  may  have  been 
installed  by  any  pending  function  including  the  current  one.  The  handler  has  access  to  the  current 
values  of  the  arguments  and  local  variables  of  the  function  that  installed  it,  and  designates  one 
of  these  variables  to  receive  the  value  v.  The  pushhandler  and  pophandler  expression  forms 
manipulate  the  stack  of  pending  exception  handlers,  but  cannot  remove  any  handlers  installed 
before  the  call  to  the  current  function.  A  return  expression  implicitly  pops  all  exception  handlers 
installed  by  the  current  function,  restoring  the  handler  that  was  current  when  the  function  was 
called. 

The  type  system  of  Lilt  is  essentially  that  of  the  higher-order  polymorphic  A-calculus  [27] 
augmented  with  several  useful  types  for  programming.  The  language  includes  the  base  types 
int,  bool  and  unit  as  well  as  the  familiar  n-ary  product  types  ((ti,  . . . ,  rn)),  array  types  (r  array) 
and  function  types  ((ti,  . . .  ,rn)  — >  r).  The  variant  type  [l\  :t\  . . . .  ,in:Tn\  is  essentially  similar  to 
the  more  familiar  n-ary  sum  type  (ti  +  •  •  •  +  rn)  found  in  other  calculi;  the  labels  i\, . . . ,  in  are 
distinct  integers,  and  serve  to  identify  the  summands.  (They  correspond  directly  to  the  "tag" 
words  used  by  the  implementation.)  We  have  chosen  to  use  labeled  variant  types  rather  than 
unlabeled  sum  types  in  Lilt  because  they  admit  a  very  straightforward  translation  into  Talt.  The 
Lilt  type  system  also  includes  recursive  types  ( ya.r ),  and  universal  and  existential  quantification 
(Vai:fci, . . . ,  an:kn.r,  3a\:ki, . . . ,  an:kn.r).  Finally,  higher-order  type  constructors  may  be  formed 
by  abstraction  (A a:k.c)  and  applied  in  the  usual  way  (c\  cf). 


5.2  Static  Semantics 

The  judgment  forms  of  the  Lilt  type  system  are  listed  in  Table  5.1.  The  complete  set  of  rules 
defining  these  judgments  may  be  found  in  Appendix  D;  I  will  discuss  only  the  more  unusual 
aspects  of  the  type  system  in  this  section. 

The  central  typing  judgment  in  Lilt  is  the  one  for  expressions.  The  judgment  A;  A;  E;  T;  r  b  e 

states  that  e  is  a  well-formed  expression,  where: 
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Judgment 

Meaning 

Abc:fc 

A  b  ci  =  C2  :  k 

ALT 

A  b  E 

c  has  kind  k 

ci  and  c2  are  equivalent  at  kind  k 

T  is  well-formed 

E  is  well-formed 

A  b  q  :  n  =>-  t2 
f;A;Tbr:r 
$;A;Tb  cond  cond 
T>;  A;  A;E;T;r  b  e 
A;  A;r  b  5  :  7 
$bf:r 

b  P 

q  coerces  from  T\  to  r2 
r  has  type  r 

cond  is  a  well-formed  condition 
e  is  well-formed 

B  is  a  block  of  type  7 

F  is  a  function  of  type  r 

P  is  a  well-formed  program 

A  b  n  <  r2 
a  b  Ti  <  r2 

A  b  Ei  <  E2 

A  b  E  handles  T 

77  is  a  subtype  of  r2 

Ti  is  a  subtype  of  T2 

Ei  is  a  subtype  of  E2 
(see  discussion  of  raise) 

Table  5.1: 

Lilt  typing  judgment  forms 

•  $  is  a  function  context,  which  assigns  types  to  the  function  symbols  defined  in  the  program. 

•  A  is  a  type  context,  which  assigns  kinds  to  constructor  variables.  The  contents  of  A  will  be 
the  type  parameters  of  the  current  function  and  those  of  the  current  block,  plus  any  addi¬ 
tional  variables  introduced  by  unpack  expressions. 

•  A  assigns  types  to  the  block  labels  in  the  current  function. 

•  E  describes  the  pending  exception  handlers,  if  any,  that  have  been  installed  by  the  current 
function. 

•  T  is  a  local  context,  which  assigns  types  to  the  local  variable  names  that  may  appear  in  e. 

•  r  is  the  return  type  of  the  current  function. 

If  this  judgment  holds,  then  the  expression  e  performs  zero  or  more  primitive  operations  and  then 
does  one  of  three  things:  It  may  return  a  value  of  type  r  from  the  current  function,  it  may  jump  to 
one  of  the  labels  declared  in  A,  or  it  may  raise  an  exception.  The  typing  rule  for  return  expressions 
states  that  returning  a  value  of  the  appropriate  type  is  always  permitted: 

$;A;TN:r 
<&;  A;  A;  5;  T;  r  b  return  v 

Jumping  to  a  label  is  allowed  provided  the  label  identifies  an  ordinary  block  (as  opposed  to  an 
exception  handler)  that  can  accept  the  current  state  of  the  local  storage  and  exception  stack.  A 
block  may  require  some  type  arguments  in  addition  to  those  of  the  enclosing  function;  the  goto 
expression  must  provide  constructors  of  the  appropriate  kinds: 

(A(£)  =  Ibliar-h, an:kn ;  S';  T')) 

A  b  a  :  ki  A  b  T  <  T'[c/a\  A  b  E  <  E '[c/a] 

<f>;  A;  A;  E;  T;  r  b  goto  £[ci, . . .  ,cn] 
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Installing  an  exception  handler  has  similar  typing  requirements  to  jumping:  the  constructor  ar¬ 
guments  must  be  properly  kinded  and  the  current  stack  of  exception  handlers  must  be  consistent 
with  the  new  handler's  expectations.  However,  it  is  not  necessary  that  the  local  context  match  the 
one  expected  by  the  handler  at  the  point  the  handler  is  installed;  this  requirement  is  deferred  to 
the  point  at  which  an  exception  is  raised.  The  rule  for  pushing  an  exception  handler  is  as  follows: 

(A(f)  =  hnd{a\:k\, ... ,  an:kn ;  S';  T')) 

Ah  ci'.ki  A  h  S  <  S'[c/a]  $;  A;  A;  (S,  T'jc/a]); T;  r  b  e 

•L;  A;  A;  S;  T;  r  h  pushhandler  t\c\ , . . . ,  Cn]  in  e 

The  typing  rule  for  raise  expressions  requires  that  the  local  context  match  the  one  expected  by 
the  current  handler.  This  is  captured  by  the  premise  A  b  S  handles  T: 

A;  T  b  v  :  Texn  A  b  5  handles  T 
$;  A;  A;  E;  T;  r  b  raise  v 

The  auxiliary  judgment  A  b  S  handles  T  (defined  in  Appendix  D)  holds  if  S  is  empty,  meaning 
that  the  current  exception  handler  was  not  locally  installed  (in  which  case  the  contents  of  T  are 
irrelevant  because  the  current  locals  will  be  discarded),  or  if  5  is  nonempty  and  the  local  context 
T  matches  the  expectations  of  the  current  locally  installed  handler  as  given  by  E.  Importantly, 
raise  v  is  not  the  only  form  of  expression  that  may  raise  an  exception.  Array  subscript  operations 
may  do  so  (if  the  index  is  out  of  bounds),  and  so  may  function  calls  (if  the  callee  raises  an  exception 
it  does  not  handle  itself);  therefore  the  typing  rules  for  these  forms  of  expressions  must  also  have 
premises  of  the  form  A  b  E  handles  T  to  ensure  that  the  state  of  the  local  variables  is  consistent 
with  what  the  current  handler  requires. 

Most  of  Lilt's  operations  are  performed  by  a  sort  of  let-binding  expression:  the  expression 
let  s  =  r  in  e  evaluates  r,  stores  the  result  in  location  s,  and  continues  with  e.  Its  typing  rule 
makes  use  of  an  auxiliary  judgment  to  determine  the  type  of  r: 

A;  T  b  r  :  b  A;A;5;T[s  i— >  r'jp  b  e 
A;  A;  E;  T;  r  b  let  s  =  r  in  e 

The  terms  ranged  over  by  r  (the  so-called  "small  expressions")  are  generally  single  primitive  op¬ 
erations  performed  on  syntactic  values;  they  involve  no  control  flow,  cannot  raise  exceptions,  and 
have  no  side  effects  (except  possibly  allocation,  which  may  fail  and  terminate  the  program).  Of 
these  operations,  arithmetic,  tuple  allocation  and  projection  are  relatively  standard  and  have  the 
expected  typing  rules.  Slightly  unusual  features  of  Lilt  at  this  level  are  the  treatment  of  labeled 
variant  types  (a  generalization  of  disjoint  union  or  sum  types),  and  the  use  of  coercions. 

Variants  A  value  of  variant  type  is  created  as  usual  by  the  inj  operation,  which  takes  a  tag 
integer  j  and  a  value  v,  and  produces  a  value  of  any  variant  type  containing  a  j  variant  whose 
type  is  that  of  v: 

A  b  r  =  [. . .  ,j:Tj, . . .] 

$;A;rbi;:  Tj 

T>;  A;T  b  injr(j»  :  r 

Given  a  value  of  variant  type,  accessing  its  contents  is  a  two-stage  process:  the  case  expression 
form  "narrows"  the  type  until  it  has  only  one  variant,  and  then  the  out  j  operation  can  extract  the 
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carried  value: 


A;T  h  v  :  \Jyf,i:T',  j:r] _ . 

A;  A;  E;T[s  [i:r']];r  h  ei  <f>;  A;  A;  E;T[s  \j:r,  j:r']];  r  h  e2  <L;  A;  T  h  v  :  [i  :  r\ 

<f>;  A;  A; E; T;  r  h  case  v  of  inj (i,  s)  =4>  ei  else  e2  A;T  h  out j ( v )  :  r 

The  case  expression  typed  in  this  rule  examines  the  value  v,  which  has  a  variant  type,  compares 
the  tag  of  v  to  the  number  i  and  then  continues  with  either  e\  or  e2,  after  placing  a  version  of  v 
with  an  appropriately  refined  type  in  the  location  s.  (Here  it  is  important  that  all  the  tags  in  the 
sum  type  are  syntactically  required  to  be  distinct.)  The  typing  of  e\  assumes  that  s  has  the  unary 
variant  type  corresponding  to  the  i  branch  of  the  type  of  v;  the  typing  of  e2  assumes  s  has  a  variant 
type  consisting  of  all  the  remaining  branches  of  v's  original  type.  The  small  expression  out  j  (v) 
assumes  v  has  a  unary  variant  type,  and  retrieves  the  value  it  carries. 


Coercions  The  operations  of  V-elimination,  3-introduction,  and  introduction  and  elimination  of 
recursive  types  are  intended  to  have  the  special  property  that,  when  applied  to  values,  they  require 
no  run-time  work  to  compute.  It  is  reasonably  common  practice  to  simply  include  expression 
forms  with  this  property  among  the  syntactic  values  (or  in  Lilt,  the  operands)  of  the  language. 
This  is  what  I  have  done,  except  that  I  group  these  four  different  forms  of  values  into  one,  namely 
the  application  of  a  coercion  to  a  value  (written  q@v).  From  a  typing  point  of  view,  coercions 
behave  a  bit  like  functions;  in  particular,  the  rule  for  coercion  application  is  just  like  the  usual 
function  application  rule: 

<I>;  A;  T  h  v  :  t2  Ah  q  :  t2  =>  r 
<F;  A;  T  h  q@v  :  r 

The  typing  rules  for  the  coercions  themselves  are  derived  from  the  standard  typing  rules  for  the 
constructs  they  replace.  The  V-elimination  coercion,  written  [c\, . . . ,  c„],  instantiates  a  value  of  a 
V-type: 

Ah  Ci  :  ki  for  1  <  i  <  n 

A  b  [Cl ,  -  -  - ,  cn]  .  Vo?i .k\ , . . . ,  an:kn .t  =4*  t[c± , . .  < ,  cnj ol i , . . . ,  ccn] 

The  3-introduction  coercion,  written  pack[r,  c\, . . . ,  cn\,  is  similar: 

A  h  r  =  3ai:fci, . . . ,  an:kn.T'  :  T  Ah  Ci  :  ki  for  1  <  i  <  n 
A  h  pack[r,  ci, ...  ,cn]  :  r'[ci, . . . ,  Cn/ai, . . . ,  an]  =>  r 

The  roll  and  unroll  coercions  mediate  between  a  recursive  type  and  its  unrolling: 

A  h  r  =  jia.T1  :  T  Ah  /. ia.T  :  T 

A  h  rollT  :  r'fr/a]  r  Ah  unroll  :  fia.T  =>■  T[/ia.T/a\ 

Roughly  speaking.  Lilt  uses  coercions  for  operations  whose  Talt  equivalents  are  subtyping 
rules  rather  than  value  forms  or  instructions  (and  which  are  therefore  represented  as  coercions  in 
XTALT  and  EXTALT).  This  is  not  by  accident,  since  the  "operations"  captured  by  subtyping  rules 
in  Talt  (in  which  subtyping  is  resolutely  inclusive  rather  than  coercive)  clearly  amount  to  the 
identity. 
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rfib 

int  rfib  (  int  n  )  { 

if  (  n  <  2  )  return  1; 
return  rfib(n-l)  +  rfib(n-2); 

} 


=  func(-;  [mint];  int).( 
enter(W ,  t2). 

if  n  <  2  then 
return  1 

else 

let  n  =  —  (n,  1)  in 
let  tl  =  rfib(n)  in 
let  n  =  —  (n,  1)  in 
let  t2  =  rfib(n )  in 
let  tl  =  +{tl ,  t2 )  in 
return  if  ) 


Popcorn  Lilt 

Figure  5.2:  Lilt  Example:  Recursive  Fibonacci 


int  fib(int  n) 

{ 

fib  =  func(-;  [mint];  int).( 

int  a, b, c; 

enter(a,  b,  c ). 

a  =  1 ;  b  = 

i; 

let  a  =  1  in 

while  (n  != 

:  0)  { 

let  b  =  1  in 

c  =  a  + 

b; 

goto  loop  , 

a  =  b; 

loop  =  block(-;  •;  [mint,  mint,  6:int,  c:ns]) 

b  =  c; 

if  n  =  0  then 

n — ; 

return  a 

} 

else 

return  a; 

let  c  =  +(a,  b)  in 

} 

let  a  =  b  in 

let  b  =  c  in 

let  n  =  —  (n,  1)  in 

goto  loop ) 

Popcorn 

Lilt 

Figure  5.3:  Lilt  Example:  Iterative  Fibonacci 
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Define: 

listF  =  \or.T ,\(3:T .  [0:unit,  1: (or, /3)] 
list  =  Xa:T.  fi/3. listF  a  (5 
listS  =  A a:T.  listF  a  ( list  a ) 


union  <a>l±st  { 
void  nil; 

*(a,<a>list)  cons; 

^  rev 

<a>list  rev<a> (<a>list  L)  { 
<a>list  M  =  ".nil  ; 
while  (true)  { 
switch  (L)  { 

case  nil :  return  M; 
case  cons* (h,  t )  : 

M  =  " . cons ( " (h, M) ) ; 

L  =  t; 

} 

} 

//  (Dead  code) 
return  M; 

} 


=  func(a:T ;  [. Ldist  a];  list  a).( 
enter (M,  h). 

let  M  =  injfot5a(0,*)  in 
let  M  =  roll nsta@M  in 
goto  loop  , 

loop  =  block(-;  •;  [Ldist  a,  Mdist  a ,  fcns]) 

case  unroll@L  of  inj  (0,  L )  => 
return  M 
else 

let  L  =  outj(L)  in 

let  h  =  tto(L)  in 

let  h  =  (. h ,  M)  in 

let  M  =  injJirtga(l,/»)  in 

let  M  =  roll usta®M  in 

let  L  =  tt\  L  in 

goto  loop  ) 


Popcorn  Lilt 


Figure  5.4:  Lilt  Example:  List  Reversal 


5.3  Lilt  Examples 


A  very  simple  Lilt  function,  illustrating  the  use  of  local  variables,  is  shown  in  Figure  5.2.  On  the 
left  side  of  the  figure  is  a  Popcorn  (or  C  or  Java)  function  that  computes  the  nth  Fibonacci  number 
using  the  obvious  but  inefficient  recursive  method;  on  the  right  is  the  approximate  Lilt  equivalent. 
Note  that  the  entry  block  of  the  Lilt  function  declares  the  two  local  variables  tl  and  t2  but  does 
not  give  types  for  them:  at  the  start  of  the  entry  block,  the  local  variables  are  uninitialized  and  so 
they  have  type  ns.  Also  note  that  as  in  C-like  languages,  a  function  is  allowed  to  assign  into  its 
arguments:  the  Lilt  version  of  rfib  destructively  modifies  its  parameter  n  to  compute  the  argument 
of  each  recursive  call. 

A  somewhat  more  interesting  function,  involving  some  local  control  flow,  is  the  function  fib 
shown  in  Figure  5.3,  which  computes  Fibonacci  numbers  using  a  linear-time  loop  instead  of  recur¬ 
sion.  Again,  note  that  the  three  local  variables  have  type  ns  when  they  are  first  allocated.  When 
the  block  called  loop  is  invoked  at  the  end  of  the  entry  block,  a  and  b  have  been  initialized,  but 
c  has  not;  therefore  loop's  block  header  specifies  the  type  int  for  a  and  for  b  (as  well  as  for  the 
argument  n ),  but  expects  that  c  still  has  type  ns.  By  the  time  loop  invokes  itself  (in  the  last  line  of 
code),  c  has  been  assigned  an  integer;  the  jump  is  still  well-typed  because  int  is  a  subtype  of  ns. 

A  function  with  similar  control-flow  structure  but  more  complex  typing  is  the  polymorphic 
list  reversal  function  shown  in  Figure  5.4.  This  example  uses  the  polymorphic  type  constructor 
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list,  defined  as  follows: 

list  =  A a:T.  n/3.  [0  :  unit,  1  :  (a,  /?)] 

(Note  that  the  type  list  r  is  recursive;  this  recursion  is  not  marked  by  any  special  syntax  in  Popcorn, 
but  must  be  written  with  a  /i- type  in  Lilt.)  For  convenience,  the  constructor  listS  is  also  defined 
in  the  figure;  listS  r  is  simply  the  unrolling  of  the  recursive  type  list  r.  At  the  beginning  of  the 
function  rev,  the  variable  M  is  initialized  with  an  empty  list;  this  is  a  two-stage  process  in  Lilt, 
consisting  of  an  injection  (to  produce  a  value  of  type  listS  a)  and  an  application  of  the  coercion 
roll ust  Q  to  create  the  list  itself.  The  block  named  loop  examines  the  list  currently  stored  in  the 
argument  location  L  by  unrolling  it  and  performing  a  case  analysis.  In  the  case  where  the  tag  is 
0 — that  is,  L  is  the  empty  list — the  current  value  of  M  is  returned  from  the  function.  In  the  case 
where  the  tag  is  not  0 — i.e.,  the  tag  is  1  meaning  L  is  a  cons — the  components  of  L  are  extracted  by 
outjection  and  projection,  the  head  of  L  is  added  to  the  front  of  M,  the  tail  is  stored  back  into  L, 
and  the  loop  is  evaluated  again. 
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Chapter  6 

Yield  Placement  and  Polling  Techniques 


The  major  novel  element  in  compiling  Lilt  to  Talt-R  is,  naturally,  the  placement  of  yield  instruc¬ 
tions  so  that  the  typing  conditions  regarding  the  virtual  clock  are  satisfied.  As  mentioned  briefly 
in  Chapter  1,  one  of  the  key  claims  in  this  thesis  is  that  a  requirement  that  programs  be  certifiably 
responsive  need  not  place  any  burden  on  the  typical  application  programmer.  To  support  this 
claim,  I  limit  my  discussion  in  this  chapter  and  elsewhere  to  techniques  for  placing  yields  in  a 
program  with  no  timing-related  input  from  the  programmer  at  all.  I  shall  comment  further  on  this 
assumption  in  Chapter  9. 

One  possible  strategy  is  to  place  a  yield  at  the  beginning  of  every  basic  block  in  the  program 
and  every  Y  instructions  thereafter;  this  idea,  while  sound,  is  not  very  appealing  because  yielding 
is  likely  to  be  very  expensive.  (One  can  easily  imagine  a  multiprocessing  scenario  where  every 
yield  allows  an  unbounded  number  of  other  processes  to  execute  for  up  to  Y  instructions  each.)  I 
will  describe  a  number  of  simple  yield  placement  heuristics  in  this  chapter,  intended  to  increase 
the  actual  time  between  yields  executed  by  programs  as  much  as  possible  (while  keeping  it  less 
than  Y).  These  direct  placement  strategies,  however,  all  fall  short  of  optimal  performance  if  Y  is 
large.  Later  on,  I  will  explain  how  the  singleton  and  guarded  types  of  Talt-R  may  be  used  to 
implement  dynamic  checks  that  avoid  the  limitations  of  direct  yield  placement  strategies,  greatly 
reducing  the  number  of  actual  yields  performed.  However,  even  these  checks  are  not  free,  so  it 
is  to  one's  advantage  to  minimize  the  number  of  them  that  are  needed.  Placement  of  checkpoints 
is  essentially  the  same  problem  as  placement  of  yield  instructions,  but  the  types  involved  are 
more  complicated.  Therefore,  for  the  sake  of  clarity,  I  will  structure  the  discussion  as  follows:  first, 
I  will  explain  some  strategies  for  placing  yield  instructions  with  no  dynamic  checks;  then,  I  will 
explain  how  dynamic  checking  is  possible.  The  translation  of  Lilt  to  MiniTALT-R  I  give  later  will 
combine  these  ideas,  using  the  placement  strategies  I  discuss  here  to  place  dynamic  checkpoints 
rather  than  actual  yield  instructions. 

Yield  placement  in  straight-line  code  is  not  interesting:  one  simply  ensures  that  there  are  no 
more  than  Y  non-yielding  instructions  in  between  any  two  consecutive  yields.  The  challenge  of 
yield  placement  is  focused  around  instructions  that  perform  transfers  of  control.  If  the  virtual 
clock  at  the  point  of  a  jump  is  less  than  the  value  expected  by  the  code  being  jumped  to,  a  yield  is 
necessary  before  the  jump;  on  the  other  hand,  if  the  virtual  clock  before  a  jump  is  greater  than  re¬ 
quired,  the  next  yield  will  happen  sooner  than  necessary.  There  are  essentially  four  different  kinds 
of  jumps  in  Lilt  programs  (function  call,  return,  goto  and  raise),  which  subdivide  yield  place¬ 
ment  in  to  three  subproblems.  Local,  or  mfraprocedural  placement  is  the  problem  of  ensuring  that 
goto  expressions  obey  the  virtual  clock  rules;  global,  or  mfcrprocedural  placement  is  concerned 
with  function  calls  and  returns;  and  finally  exceptional  placement  deals  with  the  timing  properties 
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Figure  6.1:  A  Flow  Graph  With  a  Join 


of  exception  handling.  I  will  discuss  each  of  these  subproblems  of  yield  placement  in  turn. 


6.1  Local  Placement 

The  problem  of  local,  or  zhfraprocedural,  yield  placement  is  concerned  with  determining  the  initial 
virtual  clock  assumptions  for  all  of  the  ordinary  blocks  in  a  Lilt  function  (that  is,  those  that  are  not 
exception  handlers  and  are  not  the  entry  block),  and  the  placement  of  yield  points  consistent  with 
these  assumptions.  This  task  is  simplified  by  the  fact  that  the  targets  of  all  local  jumps  (that  is, 
goto  expressions)  are  known,  so  an  accurate  flow  graph  for  the  ordinary  blocks  of  the  function 
can  be  built.  Even  so,  optimal  yield  placement  is  tricky.  I  will  describe  three  simple  heuristics  here, 
one  of  which  I  have  implemented  in  my  prototype  compiler;  after  I  discuss  dynamic  checks  I  will 
be  able  to  formulate  a  fourth. 


Yield-on-Jump  The  most  naive  local  yield  placement  strategy,  but  the  simplest  to  implement,  is 
to  assume  that  every  local  jump  will  involve  a  yield.  This  can  be  accomplished  either  by  assuming 
a  virtual  clock  of  zero  at  the  start  of  every  block,  or  by  assuming  a  virtual  clock  of  Y  —  1  at  the  start 
of  every  block.  In  the  former  case,  the  first  instruction  in  every  block  must  be  a  yield;  in  the  latter, 
the  last  instruction  before  every  jump  must  be  a  yield. 

Because  these  yield-on-jump  strategies  treat  every  block  and  every  jump  the  same,  making  no 
use  of  one's  static  knowledge  of  each  jump's  target,  it  is  easy  to  see  that  they  place  more  yields 
than  necessary.  Figure  6.1,  for  example,  shows  a  flow  graph  corresoponding  to  two  Lilt  blocks  and 
containing  one  join  point.  (In  Lilt,  the  extended  basic  block  consisting  of  basic  blocks  Al,  A2  and 
A3  is  thought  of  as  a  single  block.)  If  all  of  these  basic  blocks  are  short,  and  none  of  them  contains 
any  function  calls  (so  that  global  yield  placement  does  not  affect  the  example),  then  it  may  be 
unnecessary  to  yield  at  the  start  of  block  B.  In  general,  yield-on-jump  appears  to  be  badly  behaved 
for  acyclic  Lilt  functions  that  contain  several  blocks.  The  next  two  candidate  strategies  attempt  to 
do  better  on  acyclic  functions  by  propagating  approximate  timing  information  between  blocks. 


Forward  Propagation  For  the  other  two  local  yield  placement  heuristics  I  will  consider,  it  is  nec¬ 
essary  to  distinguish  between  forward  and  backward  jumps.  Specifically,  I  assume  a  total  ordering 
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on  the  blocks  in  a  function;  a  jump  whose  target  is  a  later  block  than  the  one  where  the  jump 
appears  is  called  a  forward  jump,  and  one  whose  target  is  an  earlier  block,  or  the  very  one  in 
which  the  jump  occurs,  is  called  a  backward  jump.  Note  that  if  the  flow  graph  of  a  function  is 
acyclic,  then  it  is  possible  to  arrange  the  ordering  such  that  all  jumps  are  forward;  in  a  function 
containing  loops,  every  loop  necessarily  contains  at  least  one  backward  jump.  Loops  are  a  source 
of  difficulty  for  local  yield  placement,  since  my  system  (probably)  lacks  the  expressive  power  to 
avoid  yielding  at  least  once  per  iteration,  so  I  expect  that  my  heuristics  will  give  the  best  results 
when  the  ordering  on  blocks  minimizes  the  number  of  backward  jumps.  Rather  than  attempt  to 
find  such  an  ordering,  however,  I  will  simply  use  the  order  in  which  the  blocks  appear  in  the  Lilt 
representation  of  the  function.1 

The  first  nontrivial  local  yield  placement  heuristic  is  based  on  the  operation  of  propagating 
clock  information  forward  through  a  block  as  code  for  the  block  is  generated.  The  process  is 
basically  intuitive:  starting  with  an  initial  assumption  about  the  virtual  clock  at  the  start  of  the 
block,  generate  the  instructions  for  the  block,  tracking  the  decrements  to  the  virtual  clock  with 
each  instruction.  (The  global  yield  placement  strategy  will  determine  the  effect  function  calls  have 
on  the  clock.)  If  the  clock  ever  reaches  zero  (or  becomes  inconveniently  small  for  any  operation 
that  must  be  compiled),  insert  a  yield  and  reset  it  to  Y.  At  each  leaf  of  the  extended  basic  block, 
one  is  faced  with  either  a  return,  a  raise  or  a  goto  and  a  certain  predicted  value  on  the  clock.  In 
each  of  these  cases  it  may  or  may  not  be  necessary  to  yield  before  the  transfer  of  control.  In  the 
case  of  return  and  raise,  the  decision  is  made  based  on  the  global  and  exceptional  placement 
strategies  in  use,  respectively.  It  therefore  remains  only  to  show  how  to  handle  goto. 

The  forward-propagation  method  generates  code  for  a  function  as  follows.  Compile  the  blocks 
in  order,  starting  with  the  entry  block  of  the  function.  The  initial  condition  of  the  entry  block  is 
determined  by  the  global  placement  strategy;  the  initial  condition  of  a  handler  block  is  determined 
by  the  exceptional  placement  strategy.  For  ordinary  blocks,  note  that  by  the  time  we  compile  a 
block  (labeled  by)  6,  all  forward  jumps  to  i  have  already  been  compiled.  Therefore,  these  blocks 
may  be  handled  using  three  rules: 

•  Do  not  yield  before  a  forward  jump,  unless  a  yield  is  necessary  to  accomodate  the  j  mp  in¬ 
struction  itself. 

•  The  initial  condition  for  each  ordinary  block  i  is  the  minimum  virtual  clock  value  seen  at 
any  forward  jump  to  t,  adjusted  to  account  for  the  jump  instruction. 

•  For  backward  jumps,  the  target  block  has  already  been  compiled;  determine  whether  to  yield 
before  jumping  based  on  the  target  block's  initial  condition. 

This  approach  has  the  advantage  that,  although  every  loop  needs  at  least  one  yield,  there 
may  not  need  to  be  a  yield  at  every  backward  edge  if  the  initial  assumption  at  the  top  of  the 
loop  is  small  enough.  It  also  may  be  more  algorithmically  convenient  to  use  this  heuristic,  which 
processes  blocks  in  a  forward  direction,  than  the  next  one  in  which  blocks  are  scanned  backwards. 


Backward  Propagation  The  forward  propagation  method  started  with  an  initial  assumption 
about  each  block  and  determined  what  the  block  could  guarantee  at  each  leaf.  It  is  also  possible 
to  place  yields  by  starting  with  the  requirement  at  each  leaf  of  a  block,  and  propagating  backivard 
to  determine  the  requirement  at  the  block's  beginning.  To  do  this,  the  instructions  for  each  basic 

1The  problem  of  finding  an  optimal  ordering  is  NP-hard  (stated  without  proof  by  Manber  [43],  p.  429). 
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block  must  be  generated  in  reverse  order,  incrementing  the  requirement  (rather  than  decrement¬ 
ing  an  assumption)  with  each  instruction  until  the  value  reaches  Y.  When  this  happens,  a  yield 
is  inserted  and  the  requirement  is  reset  to  zero.  Conditional  expressions  within  extended  basic 
blocks  require  conservative  approximation:  the  requirement  before  an  if  or  a  case  instruction  is 
computed  based  on  the  maximum  of  the  requirements  of  the  branches. 

To  generate  code  for  a  function  using  the  backward  propagation  method,  compile  the  blocks 
in  reverse  order.  For  each  leaf  of  each  block,  determine  the  final  requirement :  for  return  and  raise 
this  comes  from  the  global  and  exceptional  placement  policies,  as  before.  For  jumps,  there  are  two 
cases. 

•  If  the  jump  is  forward,  then  its  target  has  already  been  compiled.  The  final  requirement  of 
the  current  basic  block  is  then  the  target  block's  initial  requirement,  plus  the  cost  of  the  j  mp 
instruction.  If  this  is  greater  than  Y,  insert  a  yield  before  the  jump. 

•  If  the  jump  is  backward,  then  insert  a  yield  and  assume  a  final  requirement  of  zero. 

Since  the  initial  conditions  of  the  exception  handler  blocks  and  of  the  function's  entry  block  are 
not  determined  by  local  placement,  it  may  be  necessary  to  insert  a  yield  at  the  beginnings  of 
these  blocks  if  the  computed  initial  requirement  exceeds  this  initial  condition. 

The  backward  propagation  method  has  the  advantage  that  it  does  not  require  tracking  any  ad¬ 
ditional  information,  whereas  for  forward  propagation  one  has  to  remember  the  minimum  clock 
value  associated  with  each  forward  jump  until  the  target  block  is  compiled.  However,  the  assump¬ 
tion  that  every  return  has  the  same  requirement  does  not  mesh  well  with  the  global  placement 
strategy  I  developed  for  my  compiler.  I  therefore  used  forward  rather  than  backward  propagation 
for  local  yield  placement. 


6.2  Global  Placement  with  Call-Return  Yielding 

Global,  or  mUrprocedural,  yield  placement  differs  from  local  placement  in  that  function  pointers 
are  first-class  values  in  Lilt,  and  therefore  for  some  call  sites  it  may  not  be  statically  obvious  which 
function  is  being  called.  Thus,  finding  a  guaranteed  optimal  placement  of  yield  points  would 
seem  to  require  interprocedural  control  flow  analysis.  Fortunately,  I  know  of  at  least  two  global 
yield  placement  strategies  that  do  not  require  this  complexity:  these  methods  treat  all  functions 
and  all  function  call  sites  equally,  avoiding  the  need  to  match  up  function  calls  with  their  targets.  I 
will  describe  these  two  strategies,  which  I  call  call-return  yielding  and  Feeley  yielding,  before  moving 
on  to  discuss  yield  placement  for  the  exception  handling  features  of  Lilt. 

It  is  possible  to  devise  a  global  placement  heuristic  that  relies  on  only  a  small  portion  of  the 
TALT-R  type  system.  First,  note  that  the  inclusion  of  a  term  for  ck  in  the  register  file  type  allows 
one  to  specify  the  time  on  the  virtual  clock  at  the  start  and  end  of  a  function,  similarly  to  TALres 
[18].  For  instance,  the  type 

Vp:TD.  {eax:B4,  esp:({eax:B4,  esp:p,  ck:Ay}  — >  0 )  X  p,  ck:Fi}  — >  0 

describes  a  function  that  takes  an  integer  argument  (in  eax)  and  returns  an  integer  (also  in  eax); 
further,  this  function  may  be  called  whenever  there  is  at  least  k\  +  1  on  the  virtual  clock  and  is 
guaranteed  to  return  with  at  least  Ay  remaining.  Unlike  in  TALres,  however,  this  function  may  be 
called  at  any  time  (assuming  that  0  <  k\  <  Y):  if  the  value  of  the  virtual  clock  at  the  desired  call 
site  is  not  known  to  be  at  least  k\  +  1,  the  caller  simply  yields  before  making  the  call,  resetting 
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the  virtual  clock  to  Y.  Similarly,  if  k2  is  not  enough  time  for  the  caller  to  complete  its  own  work, 
it  has  only  to  yield  after  the  function  returns.  Furthermore,  by  similar  arguments  (and  with  the 
added  assumption  that  0  <  &2  <  Y),  any  function  may  be  made  to  satisfy  these  timing  prop¬ 
erties  by  proper  local  yield  placement  (which,  as  discussed  above,  may  include  inserting  yield 
instructions  at  the  function's  beginning  and  end). 

As  an  interesting  special  case,  consider  setting  k\  =  &2  =  0  for  every  function  in  a  program. 
This  forces  the  first  instruction  of  each  function's  body,  and  the  instruction  immediately  following 
each  call  instruction,  to  be  a  yield,  so  I  call  this  scheme  call-return  yielding.  (Choosing  k\  = 
k-2  =  Y—l  would  have  a  similar  effect,  except  that  the  yields  would  need  to  occur  just  before,  rather 
than  just  after,  the  jumps.)  Call-return  yielding  is  simple,  but  it  is  far  from  optimal  if  Y  is  large 
compared  to  the  running  time  of  most  functions  (a  reasonable  assumption).  If  some  functions  are 
very  short  compared  to  Y,  it  would  be  safe  to  perform  several  calls  to  these  functions  in  succession 
with  no  yields  at  all,  but  the  call-return  strategy  incurs  the  cost  of  the  yield  operation  at  least  twice 
per  call. 


6.3  Global  Placement  with  Feeley  Yielding 

It  is  possible  to  improve  over  call-return  yielding  by  giving  types  to  functions  that  more  precisely 
capture  their  timing  behavior.  For  example,  by  analogy  with  TALres,  we  might  write  the  type 

Va:N.V/9:TD.  {eax:B4,  esp:({eax:B4,  esp:p,  ck:a}  — >  0)  X  p,  ck :k  +  a}  — >  0 

to  describe  a  function  that  takes  time  k.  Quantifying  over  the  amount  of  time  remaining  on  return 
expresses  the  fact  that  this  function  returns  with  all  but  k  of  its  initial  virtual  clock  remaining, 
whatever  that  value  happens  to  be.  There  is  a  problem,  however:  a  function  of  this  type  cannot 
yield!  To  see  why,  note  that  the  function  must  execute  its  return  instruction  with  a  +  1  remaining 
on  the  virtual  clock;  but  as  far  as  the  function  knows,  a  could  be  any  natural  number.  In  particular, 
a  might  be  larger  than  Y — but  Y  is  the  largest  clock  value  the  function  can  ever  ensure  after  it  has 
performed  a  yield  instruction. 

In  reality,  of  course,  a  will  never  be  larger  than  Y ;  in  fact,  the  initial  clock  value  of  /;:  +  a  can 
be  at  most  Y  —  l.  Hence,  if  the  function  yields,  the  resulting  clock  value  of  Y  is  guaranteed  to  be 
greater  than  or  equal  to  a  +  1,  allowing  the  function  to  return.  As  discussed  in  Section  3.3.3,  code 
blocks  in  MiniTALT-R  are  permitted  to  depend  on  constraint  assumptions;  the  addresses  of  such 
blocks  are  given  guarded  types  so  that  they  cannot  be  executed  unless  the  constraints  are  satisfied. 
For  example,  if  I  decide  the  type  of  a  function  should  be 

Va:N.Vp:TD.  (k  +  a  <  Y  —  1)  =>■  {eax:B4,  esp:({eax:B4,  esp:p,  ck:a}  — >  0 )  X  p,  ck :k  +  a}  — »  0 

(the  same  type  as  the  previous  attempt  at  a  function  of  cost  k,  except  for  the  guard),  then  I  add 
the  hypothesis  (k  +  a  <  Y  —  1)  true  to  the  static  context  when  typing  the  function's  code.  This 
hypothesis  will  then  be  available  for  use  in  proving  formulas  true  within  the  function  body.  In 
particular,  in  order  for  the  function  to  return  after  a  yield,  I  need  to  show  that  1  +  a  <  Y .  This  is 
especially  easy  when  k  >  1,  since  (using  the  ordering  axioms,  monotonicity  and  transitivity)  I  can 
reason  as  follows: 

1  +  a  <  k  +  a  <Y  —  1  <Y 

As  a  matter  of  fact,  a  function  with  the  above  type  need  not  yield  immediately  before  it  returns, 
because  a  stronger  fact  holds: 
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Proposition  6.1  IfO<k<Y,  then  (a:N,  (A;  +  a  <  y  —  1)  true)  hi  +  a  <  Y  —  k  true. 


Proof  Sketch:  Let  A  be  the  context  in  the  judgment  to  be  derived.  Using  commutativity,  the 
addition  axiom  and  reflexivity  of  ordering,  A  \~  k  —  1  +  (1  +  a)  <  k  +  a  true.  Using  the  addition 
axiom  and  reflexivity  of  ordering,  A\-  Y  —  1  <  k  —  1  +  Y  —  k  true.  Invoking  the  hypothesis  in  A 
and  using  transitivity  twice,  we  get  A  \~  k  —  1  +  (1  +  a)  <  k  —  1  +  Y  —  k  true.  By  the  cancellation 
rule,  Ahl  +  a<y  —  fcas  required. 

Alternatively,  observe  that  the  formulas  on  the  left  and  right  of  the  turnstile  in  this  judg¬ 
ment  have  the  same  interpretation  as  polynomial  constraints  in  the  sense  of  Chapter  4,  namely 
(a  H — Y  +  /;:  +  !  <  0).  The  soundness  results  of  Chapter  4  imply  that  the  judgment  is  derivable 
(and  is  in  DLP\). 

End  of  Sketch. 


A  consequence  of  this  proposition  is  that  a  function  with  the  type  given  above  may  execute  up 
to  k  instructions  between  its  last  yield  and  its  final  ret.  If  j  instructions  have  been  executed  since 
the  last  yield  and  j  <  k,  then  the  virtual  clock  will  read  Y—j.  It  follows  that  Y—j  >  Y—k  >  1+a, 
making  a  return  instruction  well-typed. 

As  was  the  case  in  our  discussion  of  call-return  yielding,  the  function  type  just  examined  does 
not  bound  the  number  of  instructions  executed  by  a  function.  It  merely  guarantees  that  any  func¬ 
tion  of  that  type  that  takes  more  than  k  instructions  will  yield  after  executing  at  most  k  instruc¬ 
tions,  and  that  if  such  a  function  does  yield,  the  last  time  it  does  so  is  at  most  k  instructions  before 
it  returns.  By  placing  yields  appropriately,  any  function  can  be  made  to  obey  these  criteria. 

Once  again,  an  interesting  special  case  arises  if  the  value  of  k  is  fixed  for  all  functions  in  the 
program:  in  this  case,  the  result  is  essentially  the  yield-placement  strategy  described  by  Feeley 
[23].  Feeley,  whose  motivation  was  placing  checkpoints  in  a  program  to  detect  interrupts,  named 
his  strategy  balanced  polling.  (Feeley  also  inspired  my  use  of  the  term  call-return  yielding.)  I  choose 
to  refer  to  the  yielding  scheme  I  have  just  described  as  Feeley  yielding,  and  I  follow  Feeley  in 
using  the  letter  E  to  denote  the  fixed  value  we  have  chosen  for  k.  The  major  advantage  of  Feeley 
yielding  is  that  functions  that  contain  no  loops  or  function  calls  and  are  shorter  than  E  instructions 
need  not  yield  at  all  (whereas  in  call-return  yielding  every  function  must  yield).  Further,  from  the 
caller's  point  of  view,  any  function  appears  to  cost  exactly  E  instructions.  Thus  if  E  is  small 
enough  compared  to  Y,  several  function  calls  may  occur  in  succession  without  the  caller  having 
to  yield  in  between. 

A  sample  MiniTALT-R  program  fragment  using  the  Feeley  yielding  strategy  is  shown  in  Fig¬ 
ure  6.2.  The  function  in  the  figure  is  a  recursive  function  to  compute  Fibonacci  numbers;  it  was 
hand-coded  in  MiniTALT-R  and  is  displayed  in  approximately  Intel  assembler  syntax.  Note  that 
the  function  has  a  "short  path"  corresponding  to  the  case  where  the  argument  is  less  than  or  equal 
to  1,  and  a  "long  path"  that  performs  two  recursive  calls  if  it  is  not.  Notice  that  the  short  path 
does  not  need  to  yield  (of  course,  this  depends  on  E  being  chosen  large  enough).  The  long  path 
must  yield  before  the  first  recursive  call,  and  between  the  last  call  and  the  final  return  instruction. 
This  is  typical  of  Feeley  yielding,  since  any  function  might  start  out  with  as  little  as  E  on  the  clock, 
but  any  callee  requires  at  least  E;  similarly,  no  callee  can  be  assumed  to  return  with  more  than 
Y  —  E  —  1  on  the  clock,  but  the  caller  cannot  return  without  at  least  Y  —  E.  Notice,  however,  that 
no  yield  is  needed  in  between  the  two  recursive  calls  (again  assuming  appropriate  values  for  Y 
and  E). 
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Note:  this  example  assumes  that  E  >  4  and  that  Y  >  2E  +  8. 


fib : 

//  ck  :  E  +  a,  (E  +  a  <  Y 

cmp  eax, 1 

ja  LI 

mov  eax, 1 

//  ck  :  E  —  3  +  a 

ret 

LI  : 

//  ck  :  E  —  2  +  a 

push  eax 

sub  eax, 1 

//  ck  :  E  —  4  +  a 

yield 

//  ck  :  F 

call  fib 

//  ck  :  Y  -  E  -l 

pop  ecx 

push  eax 

mov  eax, ecx 

sub  eax, 2 

//  ck  :  Y-E-5 

call  fib 

//  ck  :  Y-2E-6 

pop  ecx 

add  eax, ecx 

//  ck  :  Y  -  2E  -  8 

yield 

//  ck  :  F 

ret 


—  1)  true 

//  n  <  1? 

//  Return  1 

//  Compute  fib(n-l) 

//  Compute  fib(n-2) 

//  eax  :=  f ib (n-1 ) +f ib (n-2 

//  Return 


Figure  6.2:  Fibonacci  using  Feeley  Yielding 
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6.4  Exceptional  Placement 

A  simple  heuristic  suffices  for  exceptional  yield  placement.  In  particular,  since  it  is  often  unknown 
at  the  site  of  a  raise  expression  which  handler  is  being  invoked,  the  best  solution  is  probably  to 
use  a  fixed  initial  assumption  for  all  handler  blocks  and  treat  raise  expressions  accordingly  If 
the  initial  condition  of  all  exception  handlers  is  taken  to  be  H,  then  the  requirement  to  generate  a 
raise  is  clearly  H  plus  the  cost  of  raising  the  exception  (a  few  instructions). 

There  is  room  for  clever  improvement  of  this  method:  if  a  raise  occurs  in  a  context  where  the 
current  handler  can  be  statically  predicted,  then  it  may  be  possible  to  avoid  yielding  before  raising 
the  exception  if  the  handler  block  is  short;  however,  if  a  handler  might  be  invoked  in  a  context 
where  its  identity  is  unknown,  its  initial  requirement  had  better  be  at  most  H.  It  does  not  seem 
likely  that  any  serious  advantage  can  be  gained  from  this  flexibility,  so  I  have  not  investigated  it. 


6.5  Clocks  and  Polling 

The  yield  placement  strategies  I  have  discussed  are  straightforward  and  easy  to  implement,  but 
they  fall  well  short  of  the  ideal  goal  of  yielding  exactly  once  for  every  Y  other  instructions  ex¬ 
ecuted.  The  reason  is  that,  while  the  changes  in  the  virtual  clock  can  be  precisely  tracked  over 
straight-line  code  or  tree-structured  code,  this  precision  cannot  be  carried  across  extended  basic 
block  boundaries.  Once  the  yield  period  Y  is  larger  than  the  length  of  the  longest  extended  basic 
block  in  the  program,  one  cannot  expect  that  increasing  it  any  more  will  continue  to  lower  the 
actual  frequency  with  which  the  program  will  yield  under  these  strategies. 

One  possible  direction  for  further  refinement  is  to  enrich  the  static  reasoning  capabilities  of 
the  Talt-R  type  system,  so  that  it  can  capture  more  and  more  complex  coding  idioms,  including 
loops  and  recursion.  This  is  the  approach  taken  in  LXres,  where  the  equivalent  of  the  static  term 
language  includes  sum,  product  and  inductive  kinds  (inherited  from  LX  [17])  and  primitive  recur¬ 
sion  in  addition  to  basic  arithmetic.  Unfortunately,  the  potential  benefits  of  this  kind  of  system  are 
difficult  to  realize  without  significant  contributions  from  the  programmer.  Fundamentally,  any 
improvement  along  the  static  reasoning  axis  involves  two  tightly  coupled  areas  of  simultaneous 
development:  that  of  more  and  more  sophisticated  program  analyses  to  detect  opportunities  for 
avoiding  yields,  and  that  of  more  and  more  expressive  type  systems  to  certify  that  the  resulting 
optimized  programs  are  still  safe. 

An  alternative  to  improving  the  static  reasoning  capabilities  of  the  language  is  to  rely  to  some 
extent  on  dynamic  mechanisms.  That  is,  rather  than  implementing  static  analyses  and  compiler 
passes  that  safely  hoist  yield  instructions  out  of  loops,  one  can  generate  programs  that  keep  track 
of  time  as  they  run  and  yield  only  when  needed.  Of  course,  some  static  reasoning  is  needed  to 
certify  the  correctness  of  the  instruction  counting,  but  it  turns  out  that  this  is  not  difficult.  In  fact, 
it  is  substantially  easier  than  beefing  up  the  type  system's  logical  power  to  the  point  of  being  able 
to  handle  real  programs. 

Here  is  the  idea:  Let  the  program  use  one  of  the  machine's  general-purpose  registers  to  main¬ 
tain  a  dynamic  approximation  of  the  number  of  instructions  remaining  until  the  next  yielding 
operation  is  due  to  occur.  This  approximation  is  maintained  by  periodically  subtracting  from  the 
register  until  it  becomes  zero  (or  inconveniently  close  to  zero);  when  that  happens,  the  program 
must  assume  it  has  run  out  of  time.  It  yields,  resets  the  register  and  continues.  I  call  this  behavior 
polling. 
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6.5.1  Clocks 

To  implement  polling,  I  reserve  one  general-purpose  register  for  timing  purposes.  I  will  use  the 
name  rck  for  this  register  and  refer  to  it  as  the  clock  register  (to  distinguish  it  from  the  pseudoreg¬ 
ister  ck,  the  virtual  clock).  Note  that  although  I  give  a  descriptive  name  to  the  clock  register  for 
the  sake  of  presentation,  there  is  nothing  special  about  this  register  as  far  as  the  type  system  is 
concerned.  In  fact,  it  is  not  strictly  necessary  to  store  the  value  of  the  clock  register  in  a  register  at 
all:  it  would  also  be  reasonable  to  stack-allocate  it  and  save  the  register  for  other  uses.  It  is  perhaps 
most  helpful  to  think  of  the  name  'rck'  as  referring  to  the  role  played  by  a  certain  register,  rather 
than  to  the  register  itself. 

The  purpose  of  the  clock  register  is  to  store  an  approximation  of  the  number  of  clock  cycles 
left  before  the  next  yield  must  happen.  In  particular,  programs  will  maintain  the  invariant  that  the 
value  of  rck  is  always  less  than  or  equal  to  the  value  of  the  virtual  clock.  A  special  significance  is 
attached  to  the  difference  between  these  two  quantities  (or  the  best  available  static  approximation 
thereof):  this  is  the  maximum  number  of  nonyielding  instructions  the  program  can  execute  before 
some  action  must  be  taken  to  maintain  the  invariant,  either  by  yielding  or  by  decreasing  the  value 
of  rck. 

For  simplicity,  let  us  assume  that  updates  to  the  clock  register  occur  in  a  highly  stereotyped 
pattern:  At  points  in  the  program  where  the  virtual  clock  value  cannot  be  proven  to  exceed  the 
value  of  rck,  a  certain  fixed  quantity,  call  it  L,  is  subtracted  from  the  register.  If  the  new  value  is 
negative,  then  the  program  assumes  it  has  run  out  of  time,  performs  a  yield,  and  sets  the  register 
to  Y  —  L ;  if  it  is  nonnegative,  then  the  virtual  clock  now  exceeds  the  register's  value  by  at  least  L 
and  execution  can  proceed.  The  technique  will  be  most  effective  if  L  is  close  to  the  length  of  the 
longest  extended  basic  block  in  the  program,  since  it  is  at  extended  basic  block  boundaries  that 
precision  tends  to  be  lost. 

The  effect  of  this  technique  on  yield  timing  is 
depicted  in  Figure  6.3.  The  graph  on  top  shows 
value  of  the  virtual  clock  as  a  function  of  time  dur¬ 
ing  the  imagined  execution  of  some  program  com¬ 
piled  using  direct  yield  placement  as  described  ear¬ 
lier.  The  downward  sloping  portions  of  the  graph 
show  the  steady  ticking  of  the  virtual  clock  as  non¬ 
yielding  instructions  are  executed;  the  virtual  clock 
value  jumps  back  to  Y  each  time  the  program  yields. 

Clearly,  this  program  makes  rather  ineffective  use 
of  the  time  it  is  given  between  yields.  The  graph 
on  the  bottom  shows  the  same  program,  modified 
to  use  polling:  each  program  point  that  performed 
a  yield  in  the  upper  graph  now  decrements  the  in¬ 
struction  counter  instead,  yielding  only  when  that 
value  gets  close  to  zero.  Each  decrement  subtracts 
L  from  the  counter;  since  in  this  picture  Y  is  ap¬ 
proximately  4 L,  only  every  fourth  clock  check  re¬ 
sults  in  a  yield.  In  practice,  the  ratio  Y/L  is  much  larger  than  four,  giving  an  even  more  dramatic 
decrease  in  yield  frequency. 

In  general,  if  Y  =  M  ■  L,  then  each  yield  period  (of  Y  instructions)  can  be  thought  of  as  M 
minor  yield  periods  of  L  instructions  each.  The  act  of  decrementing  the  clock  register  and  yielding 


virtual  clock 
_  clock  register 


Figure  6.3:  Yields  Under  a  Polling  Strategy 
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if  necessary  is  performed  at  least  once  per  minor  yield  period,  and  every  Af'th  time  incurs  an 
ordinary  yield.  To  highlight  this  relationship,  I  call  the  sequence  of  instructions  that  updates  the 
clock  register  a  minor  yield;  the  one  of  every  M  minor  yields  at  run  time  that  must  peform  a  y  i  e  1  d 
instruction  is  called  a  major  yield.  The  task  of  yield  placement  is  now  reduced  to  the  placement 
of  minor  yields;  they  must  occur  at  least  once  every  L  instructions,  and  we  will  see  that  reasoning 
about  the  amount  of  time  remaining  before  the  next  minor  yield  is  not  much  harder  than  reasoning 
about  the  virtual  clock  itself.  Since  L  is  much  closer  to  the  lengths  of  actual  basic  blocks  than  Y , 
the  loss  of  precision  associated  with  each  join  point  in  a  program  will  be  smaller.  Moreover,  the 
cost  of  a  minor  yield  is  so  much  less  than  that  of  a  major  yield  that  the  overhead  of  the  instruction 
counting  is  insignificant. 


YIELD  = 

//  a:N,  rck:5(a),  ck  :  2  +  a 
subjae  rck,  rck,  (L  +  2),  end 
//  rck:int,  ck:a 
yield 
//  ck :  Y 

mov  rck, (Y-L-3) 

//  a't-^Y  —  L  —  3;  rck:  S{Y  —  L  —  3), 
/  /  ck  :  Y  —  1  =  L  +  (2  +  a') 

end : 

//  a':N,  rck:5(a'),  ck  :  L  +  (2  +  a') 

Figure  6.4:  Code  for  a  Minor  Yield 


6.5.2  Minor  Yields 

A  MiniTALT-R  implementation  of  a  minor  yield  is  shown  in  Figure  6.4.  Ignoring  the  type  annota¬ 
tions  for  a  moment,  the  effect  of  this  code  is  clear.  The  subjae  instruction  decrements  the  clock 
register  by  L  +  2.  If  the  result  is  nonnegative,  then  execution  continues  at  the  label  end;  if  the 
result  of  the  subtraction  is  negative,  a  true  yield  is  performed  before  end  is  reached.  The  typing 
annotations  show  that  if,  for  some  static  term  a,  the  clock  register  initially  holds  the  value  a  and 
the  virtual  clock  shows  2  +  a  remaining,  then  the  code  after  the  end  label  may  assume  that  the 
clock  register  contains  some  value  a'  such  that  the  virtual  clock  reads  L  +  (2  +  a).  I  will  use  the 
name  YIELD  to  refer  to  this  code  sequence. 

6.5.3  The  Minor  Clock 

The  informal  description  of  the  relationship  between  the  clock  register  and  the  virtual  clock  must 
now  be  made  precise.  In  order  to  ensure  that  a  minor  yield  is  always  possible,  programs  maintain 
the  invariant  that  the  clock  register  rck  always  has  some  singleton  type  S(t )  and  the  static  ap¬ 
proximation  to  the  virtual  clock  is  always  t’  +  (2  +  t)  for  some  other  term  t! .  When  this  is  the  case 
I  will  say  t!  is  the  value  of  the  minor  clock.  Intuitively,  the  minor  clock  captures  the  number  of  in¬ 
structions  that  may  be  executed  before  the  next  minor  yield.  Notice  that  in  straight-line  code,  the 
minor  clock  behaves  just  like  the  virtual  clock  in  the  sense  that  it  decrements  with  every  instruc¬ 
tion  (provided  it  is  initially  positive).  More  formally,  the  following  rule  for  the  add  instruction  is 
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derivable: 

(r(rck)  =  5(f))  (r(ck)  =  (1  +  t')  +  (2  +  t )) 

A;  Td  r  b  o\  :  int  A;  'I';  T  b  02  :  int 
A;  'T;  T  b  d  :  int  T'  A;  T'{ck:F  +  (2  +  t)}  b  I 
A;  'b;  T  b  add  d,  o\,  02;  I 

This  rule  shows  how  to  type  an  add  instruction  when  the  assumption  that  the  minor  clock  is 
1  +  t!)  note  that  as  long  as  the  destination  d  is  not  rck,  the  continuation  /  will  be  typed  under  the 
assumption  that  rck  still  has  type  5(f),  meaning  that  the  new  minor  clock  is  just  t' .  Similar  "minor 
clock  rules"  can  be  derived  for  all  the  instructions  of  Talt-R  except  for  yield.  Furthermore,  the 
typing  annotations  in  Figure  6.4  suggest  that  (if  one  ignores  the  syntactic  inconvenience  that  it 
involves  multiple  blocks  in  MiniTALT-R),  YIELD  essentially  acts  like  an  instruction  with  a  typing 
rule  like  the  following: 

(T(ck)  =  2  +  f)  A;  T1;  T  h  rck  :  5(f)  (A,  a:N);  'F;  T{rck:5(a),  ck:L  +  2  +  a}  F  I 

A;f;Th  YIELD;/ 

This  rule  states  that  YIELD  has  the  effect  of  turning  a  state  with  any  minor  clock  value  into  one 
where  the  minor  clock  is  L — but  it  may  change  the  value  of  the  clock  register. 

As  I  have  mentioned,  the  fact  that  YIELD  behaves  so  much  like  yield  means  that  the  local, 
global  and  exceptional  placement  strategies  I  previously  discussed  for  yield  should  also  work 
for  YIELD,  tracking  the  minor  clock  instead  of  the  virtual  clock  and  placing  yield  points  every  L 
instructions  instead  of  every  Y .  When  a  yielding  strategy  is  adapted  to  placing  minor  yields,  I  call 
it  a  polling  strategy.  For  example,  recalling  the  type  of  a  function  under  Feeley  yielding, 

Va:N.Vp:TD.  (E  +  a  <Y  —  1)  =>■  {eax:B4,  esp:({eax:B4,  esp:p,  ck:a}  — >  0)  X  p,  ck :E  +  a}  — >  0 

and  modifying  it  so  that  it  specifies  the  function's  behavior  with  respect  to  the  minor  clock  instead 
of  the  virtual  clock,  one  gets  the  type  of  a  function  under  Feeley  polling: 

Va:N.V6:N.V/9:TD.  (E  +  a  <  T7^ T)  =» 

{eax:B4,  rck:5(6),  esp:(V6/:N.{eax:B4,  rck:5(6/),  esp:p,  ck:a  +  (2  +  b')}  — »  0)  x  p, 
ck :(E  +  a)  +  (2  +  6)}  — ■>  0 

Notice  that,  while  under  Feeley  yielding  a  function  called  with  E  +  aon  the  virtual  clock  returns 
with  a  on  the  virtual  clock,  under  Feeley  polling  a  function  called  with  E  +  aon  the  minor  clock 
returns  with  a  on  the  minor  clock.  Notice  also  that  the  function  may  change  the  value  of  the  clock 
register;  the  code  at  the  return  address  must  be  well-typed  for  any  possible  value  on  the  clock 
register,  assuming  only  the  relationship  between  the  register  and  the  virtual  clock  that  defines  the 
minor  clock. 

Figure  6.5  shows  the  Fibonacci  function  from  Figure  6.2  implemented  with  Feeley  polling.  This 
new  function  has  the  type  given  above,  and  its  code  is  exactly  the  same  except  that  yield  instruc¬ 
tions  have  been  replaced  by  the  YIELD  macro.  Notice  that  every  YIELD,  and  every  recursive  call, 
may  change  the  value  of  the  clock  register. 
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Note:  this  example  assumes  that  E  >  4  and  that  L  >  2E  +  8. 


f  ib : 

//  a, b0  :  N,  rck:5(60), 

//  ck  :  (E  +  a)  +  (2  +  bo),  (E  +  a  <  L  —  1)  true 


cmp  eax, 1 
ja  LI 
mov  eax, 1 

/  /  ck  :  ( E  —  3  +  cl)  +  (2  +  bo) 
ret 

LI  : 

/  /  ck  :  (-E  —  2  +  a)  +  (2  +  bo) 
push  eax 
sub  eax, 1 

//  ck  :  (E  —  4  +  a)  +  (2  +  60) 
YIELD 

/  /  6i  :  N,  rck  :  S(bi) 

//  ck  :  -L  +  (2  + 
call  fib 

//  62  :  N,  rck:5(&2) 

/  /  ck  :  L  -  E  -  1  +  (2  +  62) 

pop  ecx 

push  eax 

mov  eax, ecx 

sub  eax, 2 

//  ck  :  L  —  E  —  5  +  (2  +  62) 
call  fib 

//  63  :  N,  rck:5(63) 

/  /  ck  :  -L  —  “IE  —  6  + (2  +  63) 
pop  ecx 
add  eax, ecx 

/  /  ck  :  L  —  2 E  —  8  +  (2  +  63) 
YIELD 

/ /  64  :  N,  rck  :S(b 4) 

/  /  ck  :  L  +  (2  +  64) 
ret 


//  n  <  1? 

//  Return  1 

//  Compute  fib(n-l) 

//  Compute  fib(n-2) 

//  eax  :=  f ib (n-1 ) +f ib (n-2) 

//  Return 


Figure  6.5:  Fibonacci  using  Feeley  Polling 
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YIELD(F)  = 

//  a:N,  rck:5(a),  ck:F+(2  +  a) 

subjae  rck,  rck,  (L  —  F  +  2),  end 

//  if  taken:  rck:5(a/),  a  =  a'  +  L  —  F  +  2  true, 

//  ck:  (F  +  a)  =  F  +  L  —  F  +  2  +  a'  =  L  +  (2  +  a') 

//  otherwise:  rck:int,  ck :F  +  a 

yield 

//  ck:Y 

mov  rck, (Y-L-3) 

//a!  i  r  Y  —  L  —  3;  rck:  S(Y  —  L  —  3),  ck:Y  — 1  =  L  +  (2  +  a') 

end : 

//  a^N,  rck:5(o/),  ck:F+(2  +  a') 


Figure  6.6:  A  Minor  Yield  with  F  on  the  Clock 


YIELD  (F,R)  = 

//  o:N,  rck:5(a),  ck:F+(2  +  a) 

subjae  rck,  rck,  (R  —  F  +  2),  end 

//  if  taken:  rck:5(a/),  a  =  a'  +  R  —  F  +  2  true, 

/  /  ck  :  (F  +  a)  =  F  +  R  —  F  +  2  +  a'  =  R  +  (2  +  a') 

//  otherwise:  rck:int,  ck:F  +  a 

yield 

//  ck  :Y 

mov  rck, (Y-R-3) 

//a!  i  r  Y  —  R  —  3;  rck:  S(Y  —  R  —  3),  ck:Y  — 1  =  i?  +  (2  +  a') 

end : 

//  a':N,  rck:<S(a/),  ck  :  i?  +  (2  +  a') 

Figure  6.7:  Resetting  the  Clock  from  F  to  I{ 


6.5.4  Tricks  With  Polling 

In  addition  to  reducing  the  difference  between  the  "yield"  period  and  basic  block  size,  polling 
allows  more  precision  than  ordinary  yielding  because  one  has  control  over  how  much  the  clock 
register  is  decremented  with  every  minor  yield.  For  example,  it  seems  to  occur  frequently  that  a 
yield  must  be  placed  at  a  location  where  there  is  known  to  be  some  time  left  on  the  clock.  In  an 
explicit  polling  scheme,  one  can  take  advantage  of  this  by  decrementing  the  clock  register  by  a 
smaller  amount — in  effect,  saving  the  unused  cycles  so  that  they  can  be  used  later.  The  code  in 
Figure  6.6  illustrates  this. 

Of  course,  it  is  also  possible  to  decrement  the  clock  register  by  more  than  L  +  2.  In  fact,  there  is 
no  reason  at  all  that  the  minor  clock  must  be  reset  to  L  at  every  minor  yield;  if  one  finds  oneself  at 
the  beginning  of  a  basic  block  that  is  of  length  R  (where  R  <  Y — 3),  then  one  can  subtract  R+ 2  from 
rck  and  set  the  minor  clock  to  exactly  what  the  current  block  requires.  This  is  accomplished  by 
the  code  sequence  YIELD(F,  II)  defined  in  Figure  6.7.  Note  that  the  first  two  forms  of  minor  yield 
are  really  special  cases  of  this  last  one:  YIELD(F)  is  simply  YIELD (F,L),  and  the  YIELD  from 
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Figure  6.4  is  YIELD(0,  L).  The  formal  translation  in  the  next  section  will  use  the  two-argument 
notation  exclusively. 

Using  this  precise  minor  yield  in  conjunction  with  yield-on-jump  and  call-return  yield  place¬ 
ment  strategies  results  in  a  polling  strategy  that  I  call  precise  yield-on-jump.  Under  this  strategy, 
every  basic  block  in  the  program  begins  with  a  minor  yield  that  "reserves"  exactly  the  right  num¬ 
ber  of  minor  clock  cycles  for  that  block.  While  this  does  introduce  more  minor  yields  than  would 
be  needed  under,  say,  forward  propagation  and  Feeley  polling,  it  eliminates  all  of  the  error  asso¬ 
ciated  with  join  points.  The  only  "lost  cycles"  now  occur  at  major  yields.  A  major  yield  happens 
when  the  cycles  remaining  on  the  virtual  clock  (there  will  nearly  always  be  some  left)  are  insuffi¬ 
cient  for  the  current  basic  block;  these  left-over  cycles  cannot  be  used,  but  the  waste  is  bounded 
by  the  length  of  the  longest  basic  block  in  the  program. 


6.6  Chapter  Summary 

If  programs  written  by  programmers  who  are  ignorant  of  timing  requirements  are  to  satisfy  a 
timing  policy  like  that  of  Talt-R,  yielding  operations  must  be  inserted  into  those  programs  by 
the  certifying  compiler.  This  process  must  balance  the  absolute  and  inviolable  requirements  of  the 
type  system,  which  serves  as  the  proxy  for  the  safety  policy,  with  the  desire  to  yield  as  infrequently 
as  possible  for  the  sake  of  performance. 

I  have  described  a  number  of  techniques  and  approaches  to  yield  placement,  ranging  from  the 
very  simple  to  the  fairly  complex.  The  more  complicated  techniques  are  based  not  on  advanced 
static  analyses  but  on  dynamic  instruction  counting,  an  easily  understood  mechanism  that  offers 
low  yield  frequencies  with  a  fairly  small  investment  in  type  system  complexity  and  low  perfor¬ 
mance  overhead. 


Chapter  7 

Compilation  of  Lilt 


In  this  chapter,  I  will  finally  give  a  formal  translation  from  Lilt  to  MiniTALT-R.  The  purpose  of 
this  formal  translation  is  twofold.  First,  since  it  relates  any  well-typed  Lilt  program  to  an  equiv¬ 
alent  assembly  language  program,  it  resolves  any  ambiguitiy  there  may  have  been  in  my  prose 
description  of  the  semantics  of  Lilt  language  constructs.  (Of  course,  giving  an  operational  seman¬ 
tics  for  Lilt  directly  would  have  served  the  same  need.)  Second,  and  more  importantly,  it  allows 
me  to  argue  that  the  type  system  I  propose  for  MiniTALT-R  is  sufficiently  general  to  support  all 
the  constructs  and  idioms  of  a  typical  high-level  programming  language.  In  particular,  it  demon¬ 
strates  that  the  polling  technique  I  described  in  Section  6.5  is  flexible  enough  that  resource  bound 
certification  need  not  get  in  the  programmer's  way. 

The  translation  I  give  here  uses  Feeley  polling  for  mfcrprocedural  yield  placement,  but  is  non- 
deterministic  with  respect  to  local  yield  placement.  In  other  words,  there  are  many  different 
ways  to  translate  any  Lilt  function,  differing  in  the  number  and  location  of  minor  yields  in  the 
MiniTALT-R  code.  An  actual  implementation  of  this  translation  must  resolve  the  nondeterminism 
using  a  heuristic  such  as  the  ones  I  described  earlier  in  this  proposal.  (The  prototype  compiler  I 
have  implemented  uses  forward  propagation.) 

Although  the  implications  of  polling  are  the  main  point  of  this  proposal,  the  formal  translation 
I  give  in  this  chapter  addresses  all  aspects  of  type-directed  compilation  of  Lilt.  In  particular,  I 
give  a  complete  translation  from  Lilt  types  to  Talt-R  types,  and  I  show  how  to  compile  all  the 
primitive  operations  of  Lilt.  This  makes  the  translation  as  a  whole  rather  technical.  Before  giving 
the  translation  rules  themselves,  therefore,  I  must  take  some  time  to  introduce  some  conventions 
and  notation. 


7.1  Type-Directedness 

Formal  translations  between  languages  generally  come  in  two  flavors:  syntax-directed  and  type- 
directed.1  Syntax-directed  translations  are  the  more  naive  variety:  they  are  defined  recursively 
(that  is,  by  induction)  over  the  syntax  of  the  source  language,  generally  using  little  or  no  context 
information.  A  syntax-directed  translation  usually  applies  to  any  term,  well-typed  or  not;  the 
static  correctness  theorem  for  the  translation  states  that  if  a  source  term  is  well-typed,  then  its 

1There  is  a  third  type,  called  an  elaboration,  that  differs  from  both  of  these  in  that  it  is  used  to  define  the  static 
semantics  of  the  source  language  in  terms  of  the  target.  The  archetypical  elaboration  is  the  Harper-Stone  interpretation 
of  Standard  ML  [33];  the  translation  of  Extalt-R  to  Xtalt-R  performed  by  the  certifying  assembler  (but  not  discussed 
in  this  thesis)  is  an  elaboration. 
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translation  is  well-typed.  On  the  other  hand,  type-directed  translations  are  (roughly  speaking) 
defined  by  inference  rules  that  are  constructed  to  closely  mirror  the  typing  rules  of  the  source 
language;  they  are  often  thought  of  as  being  defined  by  induction  over  typing  derivations,  rather 
than  over  terms.  Because  of  this,  it  is  usually  very  easy  to  prove  that  a  term  may  be  translated 
if  and  only  if  it  is  well-typed,  and  not  very  difficult  in  principle  to  prove  that  its  translation  is 
well-typed  in  the  target  language. 

Although  a  syntax-directed  translation  is  often  simpler  to  define  and  implement,  there  are 
many  cases  where  it  simply  does  not  make  sense  to  use  one.  For  instance,  if  the  way  a  term  is 
translated  ever  depends  on  the  type  of  one  of  its  subterms,  then  it  is  usually  advisable  to  define 
the  translation  by  induction  on  typing  rather  than  syntax.  Type-directed  translations  are  also 
called  for  when  the  target  language  is  explicitly  typed,  particularly  if  the  target  requires  typing 
annotations  in  places  where  the  source  language  does  not.  This  latter  case  clearly  arises  when 
translating  a  typed  language  like  Lilt  into  explicitly-typed  assembly  language:  the  assembly  code 
for,  say,  a  conditional  statement  will  contain  at  least  one  label,  which  must  be  annotated  with  a 
type  even  though  the  relevant  typing  information  is  not  explicitly  present  in  the  source  program. 

It  may  be  a  little  surprising,  then,  that  Lilt  may  (I  conjecture)  be  translated  to  MiniTALT-R  by  a 
syntax-directed  translation.  This  is  so  because  MiniTALT-R  (as  opposed  to  EXTALT-R)  is  implicitly 
typed,  so  the  translation  does  not  have  to  generate  any  typing  annotations.  Furthermore,  it  hap¬ 
pens  to  be  the  case  that  the  (concrete)  machine  instructions  implementing  any  Lilt  expression  can 
be  computed  independently  of  the  types  of  any  of  its  subterms.  However,  the  translation  I  give 
in  this  chapter  is  supposed  to  be  an  abstract  stand-in  for  the  one  implemented  by  my  compiler, 
and  that  implementation  targets  EXTALT-R,  not  MiniTALT-R;  because  of  the  explicit  typing  anno¬ 
tations  (and  coercions)  needed  in  EXTALT-R,  my  actual  Lilt  compiler  is  type-directed.  Therefore, 
I  give  a  type-directed  translation  in  this  chapter  even  though  doing  so  renders  the  presentation  a 
good  deal  less  concise.  I  will  use  the  context  and  typing  information  available  in  the  setting  of  a 
type-directed  translation  to  annotate  the  MiniTALT-R  output  with  typing  information  for  labels, 
even  though  such  annotations  are  not  officially  part  of  MiniTALT-R.  This  will  hopefully  help  make 
the  intended  meaning  of  the  generated  code  more  clear. 

7.2  Conventions  and  Notations 

7.2.1  Variable  Naming 

For  the  purposes  of  my  translation  from  Lilt  to  Talt,  I  will  make  some  assumptions  about  local 
variable  names.  First,  I  assume  that  local  variable  names  have  the  following  syntax: 

s  ::=  arg(i)  |  loc(i) 

Second,  I  assume  that  the  context  specifying  a  function's  formal  parameters  has  the  form  Ta  = 
[arg(l):ri, ....  arg(m):rm]  and  that  the  list  of  local  variables  declared  by  the  function's  entry  block 
is  always  loc(l), ....  loc(n).  Note  that  I  make  these  assumptions  without  any  loss  of  generality, 
since  any  Lilt  function  may  be  u-varied  into  this  form.  With  these  conventions  in  place,  the  name 
of  a  local  storage  location  s  identifies  it  as  either  a  function  argument  or  a  local  variable,  and  I  will 
show  shortly  how  the  TALToperand  or  destination  corresponding  to  a  location  may  be  determined 
based  on  its  name.  Furthermore,  it  is  no  longer  necessary  to  write  the  names  of  the  arguments  and 
local  variables  where  they  are  declared  at  the  start  of  the  function,  so  to  save  space  I  will  write 


func(A;  [n, . . .  ,ta\;t). (enter (L).e,£i  =  B1,...,£m  =  Bm ) 
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\T\  =  T4 

\h^k2\  =  | fci|  — >  \k2\ 
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|r  array | 
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3a1:|A:1|. . . .  3an:|/cn|.|r| 

Aa:|fc|.|c| 

|ci| |c2| 


Figure  7.1:  Translation  of  kinds  and  types  (except  function  types) 


instead  of 

func(A;  [arg(l):n, . . . ,  arg(A)  :ta];  r).(enter(loc(l), . . . ,  loc(L)).e,£i  =  B\  j  ■  ■  ■  >  —  P m  ) 

when  I  define  the  translation. 

7.2.2  Minor  Clock  Notation 

The  translation  uses  a  polling  strategy  for  yielding,  so  all  the  code  blocks  in  the  MiniTALT-R 
output  must  make  assumptions  about  the  minor  clock  that  are  reflected  in  their  types.  To  write 
these  types,  I  will  use  some  notation  based  on  the  fiction  that  there  is  a  single  register  called  mck, 
analogous  to  ck,  that  holds  the  value  of  the  minor  clock.  In  particular,  for  MiniTALT-R  register 
file  types  T,  define: 

r[mcku  !—►£]  =  T[esi  i— >  S(u),  ck  i— ►  t  +  (2  +  it)] 

Here  u  (which  will  nearly  always  be  a  variable)  is  the  constraint  term  representation  of  the  clock 
register  value.  The  register  esi  serves  as  the  clock  register.  T[mcku  i— >  t]  is  the  register  file  type 
that  specifies  u  on  the  clock  register  and  t  on  the  minor  clock,  and  agrees  with  T  on  everything 
else.  I  will  take  the  liberty  of  writing  register  files  that  specify  a  static  term  for  mck  in  a  similar 
way:  {ri:ri, . . . ,  rn:rn,  mckn:f}  will  denote  the  register  file  type  {ri:ri, . . . ,  rn:rn}[mckn  t— ►  t]  as 
defined  above. 


7.3  Types  and  Data  Representation 

The  translation  of  Lilt  kinds  and  type  constructors  is  defined  in  Figures  7.1  and  7.2.  The  translation 
of  kinds  is  nearly  trivial;  the  only  point  of  interest  is  that  the  Lilt  kind  T  is  translated  as  T4,  which 
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|(n,...,rm)  ->r|  =  Vpi:TD.V/02:TD.Va/:T4.Vaft:T4.Va:N.V6:N.  (U  +  a  <  L  -  1)  =7- 

{edi:re,  ebp:aq-,  esp:rr  x  <to,  mck^U  +  a)}  — ►  0 

where:  a0  =  |n|  x  •  •  •  x  |rm|  x  pi  x  77,  x  p2 

T~h  =  Oih  A  W:N.{eax:|Texn|i  esp:p2,  mck^oiF}  — ►  0 
re  =  spt  r(r/l  X  p2) 

Tr  =  V6W:N.  {eax:|r|,  edi:re,  ebp:a/,  esp:cro,  mckfe//:a}  — >  0 

Figure  7.2:  Translation  of  function  types 


means  that  any  Lilt  value  (since  it  has  a  type  of  kind  T)  will  be  represented  by  something  that  is 
32  bits  wide.  In  particular,  my  translation  will  not  require  any  run-time  type  constructor  analysis 
(as  in  [19, 17,  57])  to  compute  the  sizes  of  values. 

The  translations  of  base  types,  products  and  quantified  types  are  not  surprising.  Sum  types 
are  translated  using  Talt's  singleton  and  union  types:  for  instance,  a  value  of  type  [v'i  :tj ,  z2:r2]  is 
either  a  pointer  to  a  pair  consisting  of  the  number  i\  and  a  value  of  type  t\  or  a  pointer  to  a  pair 
consisting  of  the  number  i2  and  a  value  of  type  r2.  The  translation  of  array  types  also  makes  use  of 
singletons:  a  value  of  array  type  is  a  pair  whose  first  element  is  the  length  of  the  array  and  whose 
second  element  is  a  pointer  to  the  array  data  itself. 

Unsurprisingly,  the  treatment  of  function  types  is  the  most  complicated  part  of  the  type  transla¬ 
tion,  because  the  type  of  a  function  must  completely  capture  not  only  the  interprocedural  yielding 
or  polling  strategy  used  by  the  compiler,  but  also  the  procedure  calling  and  linkage  conventions, 
which  in  the  case  of  Lilt  includes  not  only  the  passing  of  parameters  and  the  return  address  and 
the  saving  of  registers,  but  also  the  (interprocedural)  exception  handling  mechanism.  As  the  trans¬ 
lation  in  Figure  7.2  indicates,  a  Lilt  function  expects  to  receive  its  arguments  and  return  address 
on  the  stack,  and  returns  its  result  in  eax.  The  frame  pointer  register,  ebp,  is  managed  using  a 
callee-saves  discipline:  its  initial  value,  of  the  unknown  type  a/,  is  restored  upon  exit  from  the 
function. 

Our  treatment  of  exception  handling  is  very  similar  to  that  of  the  TALx86  Popcorn  compiler 
[45],  which  in  turn  appears  to  be  based  on  the  canonical  translation  into  STAL  [46].  A  Lilt  function 
expects  to  be  passed  the  current  exception  pointer  in  register  edi.  The  exception  pointer  points 
to  the  current  exception  handler,  which  is  stored  in  an  unknown  location  on  the  stack.  The  type 
of  the  stack  expected  by  the  function,  therefore,  consists  of  the  return  address  (of  type  rr),  the  m 
arguments,  a  portion  of  unknown  type  p\,  the  exception  handler  (of  type  77,),  and  finally  a  tail 
of  unknown  type  p2.  The  handler  itself  is  a  pointer  to  code  that  can  accept  a  stack  of  type  p2; 
therefore,  to  raise  an  exception  one  may  simply  move  the  exception  value  to  be  raised  into  eax, 
move  the  exception  pointer  from  edi  into  esp,  and  execute  a  ret  instruction. 

The  typing  of  the  exception  handler  itself  is  a  bit  complicated:  on  the  one  hand,  the  function 
must  be  able  to  jump  to  the  handler  when  raising  an  exception,  but  on  the  other  hand,  the  func¬ 
tion  is  responsible  for  returning  the  handler  to  its  caller  when  it  exits.  The  exception  handler  thus 
behaves  both  like  an  argument  or  return  address  (the  function  requires  it  to  have  a  certain  type) 
and  like  a  callee-save  register  (the  act  of  calling  a  function  must  not  result  in  the  loss  of  any  infor¬ 
mation  about  the  current  handler).  This  kind  of  pattern  usually  calls  for  bounded  quantification; 
rather  than  add  this  feature  to  Talt-R,  I  use  a  known  trick  for  simulating  it  using  ordinary  univer¬ 
sal  quantification  and  intersection  types  [60,  12].  Intuitively,  the  parameter  an  is  the  "real"  type 
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k  ::=  justn  |  rctplus  n 


| just  n\a  =  n  jretplus  n\a  =  n  +  a 


(just  n)  —  m 
(retplus  n)  —  m 


=  just(n  —  m),  if  n  >  m 
=  retplus(n  —  rn),  if  n>  m 


just  n  >  just  m 
justn  >  retplus m 
retplus  n  >  justm 
retplus  n  >  retplus  m 


iff  n>  m 

iff  n  —  (L  —  E  —  1)  >  m 
iff  n  >  m 
iff  n>  m 


Figure  7.3:  Clock  Specifiers 


of  the  exception  handler;  since  the  value  pointed  to  by  edi  is  of  the  intersection  type  r> lr  it  has 
the  unknown  type  ah  but  is  additionally  bounded  above  by  the  right  conjunct,  which  is  the  code 
pointer  type  the  function  requires  the  handler  to  have. 

Finally,  observe  that  the  translation  of  function  types  specifies  a  dynamic  polling  discipline 
for  yielding,  as  described  in  Section  6.5.  The  minor  clock  pre-  and  postconditions  of  a  function 
are  expressed  using  the  minor  clock  notation  just  defined  in  Section  7.2.2.  The  value  of  the  clock 
register  when  the  function  is  called  is  the  static  term  parameter  b.  The  translated  function  type 
also  specifies  a  Feeley-style  placement  strategy  for  minor  yields:  the  minor  clock  upon  entry  to 
the  function  is  assumed  to  be  E  +  a  (where  a  is  a  static  term  parameter),  and  it  will  be  a  when  the 
function  returns.  The  exception  handler  pointed  to  by  edi  is  expected  to  require  a  minor  clock  of 
H.  The  numbers  L,  E  and  H  are  parameters  of  the  translation  and  have  the  same  meanings  as  in 
Chapter  6.  Note  that  just  like  in  my  earlier  discussion  of  polling,  the  return  address  and  exception 
handler  must  not  care  about  the  exact  value  of  the  clock  register. 


7.4  Clock  Specifiers 

In  MiniTALT-R  code  produced  by  the  translation,  the  minor  clock  at  any  point  within  a  function 
will  have  one  of  two  forms:  either  it  will  be  a  constant,  or  it  will  be  n+a,  where  a  is  the  amount  that 
must  be  present  when  the  function  returns.  So  that  the  translation  rules  do  not  have  to  mention 
the  variable  a,  Figure  7.3  introduces  clock  specifiers,  which  are  a  more  abstract  way  of  describing 
the  minor  clock.  The  clock  specifier  justn  corresponds  to  n  on  the  minor  clock;  retplus n  means 
that  the  value  of  the  minor  clock  is  n  plus  whatever  is  required  for  the  function  to  return.  Given 
the  variable  a,  \n\a  is  the  static  term  representation  of  the  minor  clock  denoted  by  n  if  the  function 
must  return  with  a  on  the  clock. 

The  figure  also  defines  the  operation  of  decrementing  a  clock  specifier  by  an  integer  constant 
(k  —  m);  note  that  this  operation  is  not  always  defined.  Finally,  the  partial  order  >  specifies  the 
constraints  on  clock  specifiers  that  can  be  soundly  inferred.  Subtraction  and  ordering  of  clock 
specifiers  will  be  used  in  the  translation  rules  to  determine  when  minor  yields  are  needed. 

The  partial  ordering  and  decrement  operation  on  clock  specifiers  express  constraints  that  can 
be  proven  in  the  Talt-R  constraint  logic  in  the  context  of  a  translated  function.  In  particular,  the 
type  of  any  code  block  in  a  function  will  associate  with  the  variable  a  a  constraint  hypothesis 
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(E  +  a  <  L  —  1 ),  which  is  enough  to  prove  any  Talt-R  constraint  derived  from  the  clock  specifier 
notation. 

Lemma  7.1  Let  A  =  (a:N,  (E  +  a  <  L  —  1)  true).  Then: 

1.  If  k  —  m  =  n!  then  A  h  |/«|a  =  m  +  \n\a,  and  this  constraint  is  in  DLP0. 

2.  If  n i  >  then  A  h  \nf\ a  A  |Ki|a  true,  and  this  constraint  is  in  DLPi. 

Proof:  Part  (1)  is  trivial  and  is  left  to  the  reader  to  check. 

For  part  (2),  there  are  four  cases: 

Case  1:  k\  =  justn,  k,2  =  justm  and  n  >  m.  Then  |[| «2 |a  <  |^i|a]  =  (m  —  n  <  0),  which  is  in 
DLPo  and  hence  in  DLPi. 

Case  2:  k.|  =  retplus  n,  r<  \  =  retplus  m  and  n  >  m.  Similar  to  the  previous  case. 

Case  3:  k\  =  retplus  n,  K2  =  just  m  and  n  >  m.  Then 

[|«2|o  <  |«i|al  =  (jn  -  (n  +  a)  <  0)  =  (-a  +  (m  -  n)  <  0) 
which  is  in  DLPo- 

Case  4:  n\  =  just  n,  K2  =  retplus  m  and  n  —  (L  —  E  —  1)  >  m.  In  this  case  we  must  finally  use 
the  constraint  hypothesis  in  A.  Note  that 

[A]  =  {(E  +  a<  r^I]}  =  {(E  —  L  +  1)  +  a  <  0}. 

Now,  |/«i|a  =  n  and  |«2  |a  =  m  +  a.  Thus 

[ | ^2 1 a  <  I«i|a]  =  (a  +  (m  -  n)  <  0). 

Subtracting  the  constraint  in  [A]  gives  (m—n+L—E—  1  <  0).  By  assumption,  the  number  on 
the  left-hand  side  is  nonpositive,  so  we  have  found  a  semantic  proof  of  the  desired  judgment 
at  depth  1  as  desired. 

End  of  Proof. 


7.5  Stacks,  Register  Files  and  Labels 

In  order  to  give  typing  annotations  for  the  labels  in  the  output  of  my  translation,  I  must  be  able 
to  specify  the  types  of  all  the  registers,  including  the  stack  pointer,  at  every  one  of  these  program 
points.  More  generally,  in  order  to  argue  that  my  translation  is  type-preserving,  I  must  be  able 
to  specify  the  types  I  intend  for  the  register  file  and  stack  at  any  point  in  the  MiniTALT-R  pro¬ 
gram  I  produce.  This  is  more  technically  involved  than  might  be  expected,  mostly  because  of  the 
exception-handling  constructs  of  Lilt. 

The  stack  frame  layout  used  by  a  Lilt  function  is  shown  in  Figure  7.4.  Note  that  the  stack 
"grows  downward"  in  the  diagram  just  as  it  does  in  memory.  All  function  arguments  are  passed 
and  stored  on  the  stack  (above  the  return  address)  and  all  of  the  function's  local  variables  are 
stack-allocated.  The  figure  also  illustrates  the  usage  of  two  important  registers  (ebp  and  edi) 
that  point  into  the  stack.  Register  ebp  plays  its  usual  role  as  the  frame  pointer,  except  that  it  is 
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older  frames 


edi 


Argument  n 


older  frames 


Argument  n 


Argument  1 
Return  address 
Local  variable  m 


Argument  1 
Return  address 
Local  variable  m 


Local  variable  1 
Saved  value  of  ebp 
Saved  value  of  edi 
Handler  1 


ebp 


Local  variable  1 
Saved  value  of  ebp 


ebp,  esp 


Handler  r 


esp,  edi 


(before  pushing  local  exception  handlers)  (after  pushing  r  local  handlers) 


Figure  7.4:  A  Lilt  function's  stack  frame 


set  up  to  point  to  the  bottom  of  the  stack  frame  instead  of  into  the  middle  as  is  more  customary. 
This  is  because  I  wish  to  address  both  arguments  and  local  variables  using  displacements  from 
ebp,  and  in  Talt  these  displacements  are  not  allowed  to  be  negative.  Each  function  stores  its 
caller's  frame  pointer  at  the  very  bottom  of  its  initial  stack  frame  and  reloads  this  value  into  ebp 
before  returning.  Register  edi  is  the  exception  pointer ;  as  I  have  already  mentioned,  its  value  is 
the  address  of  a  location  on  the  stack  where  the  current  exception  handler  is  stored.  Thus  at  the 
beginning  of  a  function,  edi  points  somewhere  above  the  function's  own  stack  frame. 

The  left-hand  side  of  Figure  7.4  shows  the  initial  state  of  a  function's  stack  frame;  in  particular, 
this  frame  has  no  pending  local  exception  handlers.  The  right-hand  side  shows  a  frame  in  which 
r  handlers  have  been  pushed  by  the  function.  Notice  that  before  pushing  the  first  local  exception 
handler,  the  function  saves  the  initial  value  of  edi  on  the  stack;  this  value  must  be  reloaded 
into  edi  when  the  function  returns,  or  any  time  the  non-local  exception  handler  becomes  current 
again.  As  long  as  the  current  exception  handler  is  local  to  the  current  function,  edi  will  have  the 
same  value  as  esp. 

The  type  of  the  stack  at  any  point  in  a  Lilt  program  can  be  determined  using  the  function  ST 
in  Figure  7.5.  Intuitively,  STpi:P2:afj0lhta(E,r,T)  is  the  type  of  the  stack  type  corresponding  to  a 
Lilt  exception  context  of  E  and  local  context  of  F,  in  a  function  that  returns  type  r.  The  subscripts 
Pi,  Pi,  (Xf,  ah,  a  specify  some  special  variables  that  are  allowed  to  occur  free  in  these  types:  p\  and 
P2  are  the  two  unknown  portions  of  the  stack,  aj  is  the  type  of  the  saved  value  of  ebp,  a*  is  the 
precise  type  of  the  exception  handler,  and  a  is  the  value  that  must  be  on  the  minor  clock  when 
the  function  returns.  (To  reduce  verbosity,  these  subscripts  are  elided  for  occurrences  of  ST  on  the 
right-hand  side  of  each  clause  when  they  are  the  same  as  on  the  left-hand  side,  and  are  elided  on 
the  left-hand  side  when  they  do  not  appear  at  all  on  the  right.) 

Figure  7.6  shows  how  to  find  the  types  of  the  registers  for  any  point  in  a  compiled  Lilt  program 
First,  RF Pl,P2,af,ah,a,uC^,  F,  r,  t)  is  the  register  file  type  associated  with  the  exception  context  E  and 
local  context  F,  assuming  r  is  the  return  type  of  the  current  function  and  t  is  the  value  of  the  minor 
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'S'T pitP2,af,ah,a('i  T,  t) 

=  OLf  X  |t;i|  X  •  •  •  X  T;m|  X  Tr  X  CT0 

where: 

T  =  [arg(l):r0i, . . . ,  arg(n):ran,  loc(l):ru, . . . ,  loc(m):r/m] 
o-o  =  |rai|  x  •  ■  •  x  ran|  x  pi  x  x  p2 

Tft  =  A  V6:N.{eax:  Texn  ,  esp:p2,  mckb:#}  -a  0 

7>  =  V6/:N.{eax:|r|,  ebp:a/,  edi:re,  esp:<7o,  mck;/:a}  — >  0 

Te  =  Sptr(Tft  X  p2) 

ST pi,P2,af,ah,a(('i  T  ),  T,  r) 

=  (V6:N.{eax:  rexn|,  esp:re  X  ST(-,  F',  r),mckft:iT}  — >  0) 
xre  x  ST(-;T;t) 

where: 

Tft  =  aft  A  V6:N.{eax:  rexn  ,  esp:p2,  mckft:#}  — >  0 
re  =  sptr(rft  X  p2) 

ST((E,  r'),r,r) 

=  (V6:N.{eax:|rexn|,  esp:5T(H,  T7,  r),  mckfc:Tf}  — ■>  0) 
x5T(H,r,r) 

Figure  7.5:  Determining  the  Stack  Type 

RFpi,P2,af,ah,a,u(-,  T,  t,  t )  =  {edi:re,  ebp:sptr(cri),  esp:ai,  mckn:i} 

where:  77,  =  A  V&^N.jeax^Texnl,  esp:p2,mck{/:iT}  — >  0 

Te  =  sptr (rh  x  p2) 

=  STpuP2taftahta(;T,r) 

RFp iiP2,a/WAM(2,  r,  r,  t)  =  {edi:sptr(cr2),  ebp:sptr(ai),  esp:<72,mckuh} 
where.  o  i  —  ST  piP2aj.aha(^-,P,T') 

& 2  =  ST pi  p2^a^ah^a{^i,  r,  t) 

Figure  7.6:  Determining  the  Register  File  Type 
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LL{ A,  E,  T,  r,  k,  [fi  ^  ti,  . . . ,  fn  rn])  = 

Vai:|fci|. . . .  Vam:|jfem|.V/9i:TD.V/o2:TD.Va/:T4.Vafc:T4. 
Va:N.V6:N.  (E  +  a  <  1^1)  => 

RFPi,P2,af,ah,a,b(S,r,T,  |«| a)[fl  >->  Ti,  .  .  .  ,  fn  t->  Tn]  ->  0 

where  A  =  apfci, . . . ,  am:km 


\lbl(A':  E;  r)|A, t,k  =  ^((A,  A');  S;  T,  r,  s,  []) 

\hnd(A':E;T)\A^K  =  Vai:|fci|. . . .  Varo:|fcfn|.Vpi:TD.V/o2:TD.Va/:T4.Vaft:T4. 

Va:N.V6:N.  (£  +  a  <  L-  1)  =» 

{eax:|rexn|,esp:5TpltP2ia/tQfci0(S,r,r),mck6:^/'}  ->  0 
where  (A,  A')  =  a\:ki, . . .  ,am:km 

Figure  7.7:  Label  and  Block  Types 


clock.  The  subscripts  Pi,  P2,ctf,ah,  a,  u  are  as  in  the  definition  of  ST,  with  the  addition  of  u,  the 
static  term  representation  of  the  register  clock. 

Finally,  Figure  7.7  shows  how  to  compute  types  for  labels  occurring  within  a  translated  func¬ 
tion  body  and  how  to  translate  Lilt  block  types.  First,  LL( A,  H,  T.  r,  k.  [fi  i— >  ri, . .  - ,  fn  ^  rn] )  is 
the  type  of  a  local  label  with  type  parameters  given  by  A  (this  includes  both  the  type  parameters 
of  the  enclosing  function  and  any  additional  parameters  of  the  current  block)  and  expecting  ex¬ 
ception  handlers  described  by  E,  local  storage  described  by  F,  and  k  describing  the  minor  clock, 
where  r  again  is  the  return  type  of  the  function  in  which  the  label  appears  and  the  additional  type 
assignments  ft  i— >  rt  specify  the  types  of  values  stored  temporarily  in  registers.  The  translation 
of  an  ordinary  block  type  is  easily  defined  using  LL;  LL  is  also  used  to  annotate  labels  that  occur 
in  the  interior  of  a  Lilt  block.  Exception  handler  blocks  are  a  little  different:  an  exception  handler 
block  expects  an  exception  value  in  eax  and  H  on  the  minor  clock. 


7.6  Translating  Operands 


Because  of  my  assumptions  about  the  names  of  local  storage  locations,  if  the  total  number  M 
of  local  variables  allocated  by  the  current  function  is  known  then  the  operand  corresponding  to 
location  s  (denoted  by  \$\m)  can  be  determined  from  the  name  s  as  follows: 

|loc(f)|M  =  [ebp+(4i)] 

|arg(i)|M  =  [ebp+(4(l  +  M  +  *))] 

In  the  MiniTALT-R  syntax  used  in  this  proposal,  stack  operands  such  as  these  are  written  exactly 
the  same  as  the  destinations  denoting  the  same  locations.  To  refer  to  the  destination  corresponding 
to  the  location  s  I  will  write  |s|^. 

I  assume  there  is  an  obvious  embedding  of  Lilt  function  symbols  into  assembly-level  labels. 
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and  extend  the  mapping  |  •  \l  to  all  Lilt  operands  as  follows: 

\ti\m  =  im(n)  |  *  \m  =  im(0) 

|tt|M  =  im(l)  |/|m  =  / 

|ff  |M  =  im(0)  \q@v\M  =  Mm 

7.7  Compiling  Expressions 

In  general,  a  Lilt  block  may  translate  to  more  than  one  MiniTALT-R  block;  a  Lilt  expression  will 
translate  to  a  MiniTALT-R  instruction  sequence  plus  zero  or  more  additional  blocks.  The  transla¬ 
tion  rules  will  use  the  letter  S  to  range  over  sequences  of  MiniTALT-R  blocks: 

S  ::=€\£:t  =  I  S 

To  make  MiniTALT-R  code  look  more  like  ordinary  assembly  code,  I  will  freely  concatenate  se¬ 
quences  of  blocks  in  the  obvious  way. 

Since  the  translation  is  type-directed,  its  structure  fol- 
C  ::=  (<L;  A;  A;  S;  T;  t)  lows  the  typing  rules  of  Lilt  rather  closely;  however,  to 

T  ::=  ...  ,£n:nn  reduce  the  clutter  on  the  left  side  of  the  turnstile  in  trans¬ 

lation  judgments,  I  collect  all  the  context  information  for 

Figure  7.8:  Translation  Contexts  a  Lilt  expression  into  one  translation  context,  ranged  over 

_  by  C  as  shown  in  Figure  7.8.  The  figure  also  shows  the 

syntax  for  local  timing  contexts  T ;  a  local  timing  context  maps  each  local  label  in  a  Lilt  function 
to  the  minor  clock  value  that  block  expects.  To  manipulate  the  context  information  collected  in  a 
translation  context  C  as  required  by  the  translation  rules,  some  notation  is  required.  In  particular, 
if  C  =  (<F;  A;  A;  S;  T;  r),  then  define  the  following: 

•  locs(C)  =  dom(T) 

•  handlers(C)  =  length(S) 

•  C(£)  =  A(£) 

•  C[s  t— >  t']  =  (<F;  A;  A;  S;  T[.s  t— >  r'];  r) 

.  C©A'  =  (<F;(A,A');A;H;T;t) 

•  C®r'  =  ($;A;A;(S,r');T;r) 

•  poph(C)  =  ($;  A;  A;  S';  T;  r),  if  H  =  (S',  T') 

•  |s|c  =  \s\b,  where  dom(T)  =  {arg(l), . . . ,  arg(A),  loc(l), ....  loc(5)}  (and  similarly  for  |s|^) 

•  CLc:/ciffAI-c:A: 

•  C  h  ci  =  C2  :  k  iff  A  h  c\  =  c-z  :  k 

•  C  h  v  :  r'  iff  d>;  A;  T  h  v  :  r' 

•  C  \=  T'  iff  A  h  T  <  T' 

•  c  \=  S'  iff  A  h  S  <  S' 
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•  C  |=  canraise  iff  A  b  E  handles  T 

The  complete  translation  rules  are  in  Section  7.8.  The  translation  judgment,  C,  T,  k  b  e  I S, 
means  that  the  instruction  sequence  I,  together  with  the  additional  blocks  S,  implements  the 
expression  e  assuming  k  describes  the  minor  clock.  The  translation  is  highly  nondeterministic: 
in  particular,  it  makes  no  commitment  to  either  forward  or  backward  propagation,  and  does  not 
specify  how  to  determine  the  initial  minor  clock  requirement  for  each  block  within  a  function.  Two 
translation  rules  ensure  that  a  minor  yield  may  be  inserted  before  any  subexpression,  whether  it 
is  needed  or  not: 

C;  T ;  (just  m)  b  e  I S  C;  T;  (just  m)  b  e  I S 

C;  T ;  (just  n)  b  e  YIELD(n,  m)  I S  C;  T;  (retplus  n)  b  e  YIELD(n,  m)  I S 

Note  that  this  rule  takes  advantage  of  the  clock  register  "tricks"  discussed  in  Section  6.5.4,  setting 
the  minor  clock  to  an  arbitrary  value  m.  The  rules  do  not  specify  the  value  of  m;  in  practice  an 
implementation  may  either  use  m  =  L  everywhere  in  a  program,  as  my  prototype  does,  or  it  may 
perform  some  analysis  to  determine  good  values  for  m  at  each  minor  yield  it  generates. 

In  the  rule  for  translating  an  mfraprocedural  jump,  the  timing  context  T  is  consulted  to  ensure 
the  target  block's  clock  expectations  are  met: 

(C(£)  =  lbl(cn:ki, . . .  ,an:kn-,E';r')) 
k  -  1  >  T(£)  CLa-.ki  C  j=  P'[c/a ]  C  \=  S '[c/a] 

C;T',k  b  goto  £[ci, . . . ,  Cn]  jmp  £ 

Since  the  initial  minor  clock  is  k,  it  will  be  n  —  1  after  the  jmp  instruction.  Thus  in  order  for  this  rule 
to  apply,  it  must  be  the  case  that  k  —  1  is  greater  than  or  equal  to  the  minor  clock  value  expected  by 
block  £.  (The  other  premises  of  this  rule  correspond  directly  to  the  premises  of  the  typing  rule  for 
goto.)  If  it  is  not  the  case  that  n—  1  >  T (£),  then  this  rule  will  not  apply,  but  one  of  the  two  yielding 
rules  will;  thus  a  well- typed  goto  expression  can  always  be  compiled,  possibly  by  yielding  first. 

The  rule  for  returning  from  a  function  takes  account  of  the  fact  that  a  clock  specifier  of  retplus  n 
means  minor  clock  is  sufficient  to  execute  n  instructions,  the  last  of  which  may  be  a  ret.  It  takes 
a  few  instructions,  however,  to  get  ready  to  return: 

(locs(C)  =  [arg(l), . . .  ,  arg(^),  loc(l), . . .  ,  loc(B)]) 

C  b  v  :  t  k  —  4  >  retplus(O)  (handlers(C)  =  0) 

C;  T;  k  b  return  v 

mov  eax,  \v\c 
pop  ebp 
sfree  (4 B) 
ret 

The  code  generated  by  this  rule  moves  the  value  to  be  returned  into  eax,  moves  the  caller's  frame 
pointer  back  into  ebp,  frees  the  stack  space  allocated  by  the  function,  and  finally  returns.  This 
takes  four  instructions,  so  the  rule  requires  that  k  —  4  >  retplus(O).  (This  is  equivalent  to  requiring 
k  >  retplus(4).)  A  side  condition  in  this  rule  requires  that  handlers (C)  =  0;  there  is  a  slightly 
different  rule  for  returning  when  there  are  local  exception  handlers  that  must  be  removed  from 
the  stack. 
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Most  of  the  other  instructions  simply  decrement  the  clock  specifier  k  by  the  appropriate  amount 
before  translating  their  subexpressions.  For  example,  translation  of  primitive  arithmetic  is  straight¬ 
forward: 

C  h  Vi  :  int  for  1  =  1,2  C[s  i— ►  int];  T ;  (k  —  3)  h  e  I S 
C;T ;  k  h  let  s  =  +(v\,V2)  in  e  ^ 
mov  eax,  \v\\c 
add  eax,  eax,  \v2\c 
mov  |s|£!,  eax 
I 
S 

(Note,  though,  that  a  simple  addition  takes  three  MiniTALT-R  instructions  because  all  local  storage 
is  on  the  stack.  This  highlights  the  need  for  a  better  register  allocation  scheme.)  If  the  translation 
encounters  an  addition  expression  like  this  one  in  a  Lilt  program  and  the  minor  clock  is  less  than  3 
(that  is,  if  k  —  3  is  undefined),  then  it  must  translate  that  expression  using  the  appropriate  yielding 
rule.  Unfortunately,  some  Lilt  operations  can  in  principle  require  an  arbitrary  number  of  instruc¬ 
tions:  allocating  a  tuple  of  size  n  requires  as  many  as  2 n  +  2  instructions,  and  calling  a  function 
with  n  arguments  costs  n  +  E  +  3.  It  is  therefore  impossible  to  require  these  operations  to  be  com¬ 
piled  to  yield-free  instruction  sequences.  The  translation  given  here  ignores  these  issues,  but  there 
is  no  reason  a  real  compiler  cannot  be  designed  to  deal  with  wide  tuples  and  high-arity  functions. 
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LA  Ah  T-i  :  T  for  each  i  Ah  r  :  T  AhA  (dom(T)  =  dom(A)) 
($;  A;  A;  •;  T;  r);  T;  (retplus(i?  —  2))  h  e  /  So 
A;  A;r;ThBi  :  (A(L),  T (£{))  •w  R  Si  for  1  <  *  <  m 

<!>  h  func(A;  r;  t). (enter (L).e,  £1  =  B\, . . . ,  lm  =  Bm)  :  VA.(r)  — >  r 
/  :  |VA.(r)  -yr\  = 

salloc  (4L) 
push  ebp 
/ 

50 

£\  '■  |A(^i)|a.t,t(<i)  =  h 

51 


£m  ■  | A(£m)| a, r,T(^m)  —  f m 

Sm 


where 

T  =  [arg(l):n, . . . ,  arg (p):rp,  loc(l):ns, . . . ,  loc(L):ns] 
each  Bi  is  either  block) Ap  S,;  T,).e  or  hndl(Aj;  Sp  Tp  s).e,  and 
dom(Tj)  =  dom(T)  for  each  i 

A,  A'  h  H 

A,  A'  h  r  ($;  (A,  A');  A;  S;  T;  r);  T;  «  h  e  /  5 


<1>;  A;  A;  r;  T  h  block(A';  H;  T).e  :  (lbl( A';  S;  T),  k)  I S 
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A,  A'  h  r 

($;  (A,  A');  A;  -;r[s  t->  rex„];r);T;  (just(FF  -  3))  h  e  t->  I S 

4>;  A;  A;  r;  T  h  hndl(A';  - ;  F;  s).e  :  ( hnd(A •;  F),  k) 
pop  edi 
mov  ebp,  esp 
mov  |s|r,  eax 
/ 

S 


(E  =  length (S)  ^  0)  A,  A'  h  3  A.A'hT 
($;  (A,  A');  A;  S;T[s  i->  rex„];  r);  T;  (just(JJ  -  4))  h  e  IS 

4>;  A;  A;  r;  T  h  hndl(A';  H;  F;  s).e  :  ( hnd(A S;  F),  n) 
mov  edi,  esp 
mov  ebp,  esp 

addsptr  ebp,  ebp,  4 (E  +  1) 
mov  |s|r,  eax 

I 

S 


(locs(C)  =  [arg(l), . . . ,  arg(A),  loc(l), . . . ,  loc {B)]) 
C  b  v  :  t  k  —  4  >  retplus(O)  (handlers(C)  =  0) 

C;T;  k  h  return  v 

mov  eax,  \v\c 
pop  ebp 
sfree  (4 B) 
ret 


(locs(C)  =  [arg(l), . . . ,  arg(A),  loc(l), . . . ,  loc(B)]) 
C\~  v  :  t  k  —  5  >  retplus(O)  ( X  =  handlers(C)  ^  0) 

C;T;  k  h  return  v 

mov  eax,  \v\c 

mov  edi,  [esp  +  4X\ 

mov  ebp,  [esp  +  (4(X  +  1))] 

sfree  (4 (B  +  X  +  2)) 

ret 


(C(£)  =  lbl(ai:ki, ...,  an:kn ;  S';  V')) 
k-1  >T(T)  CSa:ki  C|=r'[c/a]  C  \=  £'[c/a] 

C;  T;  k  h  goto  I\c i, . . .  ,cn\  jmp  1 


k  —  3  >  just  IF 
Cht:  Texn  C  |=  can  raise 

C;T;  k  b  raise  v 

mov  eax,  \v\c 
mov  esp,  edi 
ret 
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C  b  v  :  (t{  . . . ,  t' )  — >  t"  C  \=  canraise 
C  b  Vi  :  t'  for  1  <  i  <  n  C[s  e- >  r"];  T ;  (k  —  (n  +  3  +  £))  h  e  ^  f  5 

C;T;  k  h  let  s  =  i>(ri, . . . ,  vn)  ine  ^ 
push  |z;„|c 


push  |ui|c 
call  |u|c 
mov  |s|^,  eax 
sf ree  4 n 

/ 


C  h  »  :  t'  array  Cl-?/:  int 

C[s  K4  r'];T;  (k  -  7)  b  e  / 5  C;T;  (k  -  4)  h  raise  tarrayexn  h  Se 

CT-  K  h  let  s  =  sub(w,  v')  in  e  ■w 
mov  eax,  \v\c 
mov  ecx,  \v'\c 
cmpja  [eax],  ecx,  IpaSs 

Ie 

Se 

f  •  Vo-  •  N 

LL(C,  [eax  i— >  box(set=(aS2)  x  mbox(|r'|  |  atsz)),  ecx  i— >  set<(aS2)])  = 
mov  eax,  [eax  +  4] 
mov  eax,  [eax  +  0  +  4  •  ecx] 
mov  |s|“,  eax 

I 

S 


C  h  Vi  :  t'  array  C  h  V2  '■  int 

C;  T;  (k  -  4)  h  raise  rarrayex„  h  Se  C  \~  v3  :  t'  C;T;  (K-7)he-)/S 

C;T;  k  b  let  sub(n,  vT)  '■=  «3  in  e  ^ 
mov  eax,  |ui  | c 
mov  ecx,  \v2\c 
cmpja  [eax],  ecx,  £pass 
Ie 
Se 

f  •  Vo-  •  N 

t'-pass  •  vuszJ'1- 

LL(C ,  [eax  i— »  box(set=(aS2)  x  mbox(|r'|  f  aS2)),  ecx  1— >  set<(aS2)[)  = 
mov  eax,  [eax  +  4] 
mov  edx,  | V3 |c 

mov  [eax  +  0  +  4  •  ecx],  edx 

/ 

S 
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C  b  v  :  \Jir,  v.t'O-t] 

C[s  i  *  [i:r']];T;  (k  -  4)  h  ex  h  Si  C[s  ea  [JI7,  j:r  ]];  T;  (k  -  4)  h  e2  -w  I2  S2 

C;T;  k  h  case  v  of  inj  (f ,  s)  =£■  ei  else  e2 
mov  eax,  \v\e 

Cmp  je  [sax] ,  f,  Imatch 
mov  |s|^,  eax 

h 

S 2 

Imatch  ■  LL(C,  [eax  i->  |[i:r']|])  = 
mov  |s|g,  eax 

h 

Si 


C  h  Vi  :  int  for  i  =  1,2 

C;  T;  (k  -  3)  h  ei  -w  L  Si  C;  T;  (k  -  3)  b  e2  J2  S2 


C;T;  k  h  if  Oi  =  r;2  then  e\  else  e2 
mov  eax,  |t>i  | c 
cmp  eax,  |n2|c 
j  ne  I  else 

h 

Si 

I  else  :  LL(C,[])  = 

h 

s2 


C  v  :  (to,  . . .  ,Tm) 

C  b  v  :  Ti  C;T;k  —  3  \~  e  I S 

C;T;  k  h  let  ni  v  v'  in  e 
mov  eax,  \v\c 
mov  ecx,  \v'\c 
mov  [eax  +  4 i\,  ecx 

I 

S 


C  h  i>i  :  int  for  i  =  1,2 

C;  T;  k  —  3  h  ei  Ji  Si  C;T;/t-3he2^t/2S2 

C;T;  k  h  if  i>i  <  t2  then  ei  else  e2  ■w 
mov  eax,  |ui|c 
cmp  eax,  |n2|c 

j  a  lelse 

h 

51 

Ielse:LL{C,[])  = 

I2 

52 


C  h  v  :  3ai:ki, . . . ,  an:kn.T'  (C  ©  (apfci, . . . ,  an:kn))[s  1— >  t']\  T ;  (k  -  2)  h  e  /S 

C;T;  k  h  let(ai, . . . ,  an,  s)  =  unpack  rine^ 
mov  eax,  \v\c 
mov  |s|^,  eax 

I 


S 
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(C(L)  =  hnd(a.\:k\, . . . ,  an:kn\  S';  T'))  (handlers(C)  =0) 

C  b  Cj  :  ki  C  |=  S'  [c/a]  C©(r'[c/a]);T;  (fi-3)he^  IS 

C;T;  k  b  pushhandler  f'Jci, . . . ,  cn]  in  e 

push  edi 
push  l 
mov  edi,  esp 

I 

S 


( C(t )  =  hnd(a\:ki, . . . ,  an:kn\  S';  T'))  (handlers(C)  7^  0) 

C  b  a  :  C  C  \=  S'[c/a]  C  ©  (r'[c/a]);  T;  (k  -  2)  b  e  /  S 

C;T;  k  b  pushhandler  £[ci, . . . ,  cn]  in  e 

push  ^ 
mov  edi,  esp 

I 

S 


(handlers(C)  =  1)  poph(C);  T;(k  —  2)be~>/S 
C-T-kY-  pophandler  in  e 

mov  edi,  [esp  +  4] 
sfree  8 
/ 

S 


(handlers(C)  >  1)  poph(C);  T;  (/t-2)he^I5 
C;T;  k  b  pophandler  ine  ^ 
sfree  4 
mov  edi,  esp 
/ 

S 


C  b  v  :  t'  C[sier'];T;  (k-2)  be^/5 

C  b  let  s  =  v  in  e  -w 

mov  eax,  \v\c 
mov  |s|c,  eax 

I 

S 


C  b  Vi  :  int  for  1  =  1,2  C[s  *— *  int];  T ;  (k  —  3)  b  e  I S 

C;T-  k  b  let  s  =  +(fi ,  V2)  in  e 
mov  eax,  |wi|c 
add  eax,  eax,  \v2\c 
mov  |a|g,  eax 

/ 

S 


C  b  Vi  :  Ti  for  1  <  i  <  n  C[s  1— >  (ti,  . . . , r„)];  T ;  (k  —  (2 n  +  2))  b  e  /  S 

C;T;  k  b  let  s  =  (ti, . . . ,  tra)  in  e 
push  |un|c 


push  |ui|c 

malloc  eax,  ebx,  4 n 
pop  [eax  +  4-0] 


pop  [eax  +  4  •  (n  —  1)] 
mov  |s|^,  eax 

/ 


S 
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C  b  v  :  (t0,  . . .  ,rn)  C[s  i-»-  rj];T;  (n  -  3)  h  e  -w  /  5 

C;T;  k  h  let  s  =  7Tj  u  in  e 
mov  eax,  \v\c 
mov  eax,  [eax  +  4 i] 
mov  Isl^ .  eax 


I 

S 


CLt'  =  [...,j-.Tj,...]  :  T 
C\~  v  :  Tj  C[s  i— >  t')  \  T;  (k  —  5)  b  e  I  S 

C-,T ;  k  h  let  s  =  inj(.(j,  v)  in  e 
push  |t)|c 

malloc  eax,  ebx,  8 
mov  [eax],  j 
pop  [eax  +  4] 
mov  |s|g,  eax 

/ 

S 


C  b  v  :  [v.t'\  C[s  t— >  t'];  T;  (k  —  3)  h  e  • 

C;T;  k  h  let  s  =  outj(u)  in  e 
mov  eax,  |v|e 
mov  eax,  [eax  +  4] 
mov  |«|“,  eax 
I 
S 


C;T ;  (just  R)\~  e  I S  C;  T;  (just  R)  \~  e  I S 

C;T  ;  (just  F)  h  e  YIELD  {F,R)  IS  C;T ;  (retplusF)  he->  YIELD(F,i?)  /  5 
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Chapter  8 

Diverse  Safety  Policies 


Most  of  the  thesis  up  to  this  point  has  focused  on  certification  of  programs  with  respect  to  one 
specific  safety  policy.  This  is  all  very  well,  but  it  is  important  to  realize  that  the  virtual  clock 
mechanism  and  the  type  theory  of  Talt-R  can  be  adapted  to  work  with  other  safety  policies, 
extending  the  applicability  of  this  work  to  many  other  situations.  In  this  chapter  I  shall  describe  a 
number  of  possible  modifications  to  the  Talt-R  language  and  their  application  to  a  wide  range  of 
safety  policies. 


8.1  Adaptive  Responsiveness 

The  version  of  Talt-R  presented  in  detail  in  earlier  chapters  takes  the  maximum  yield  period 
Y  to  be  a  fixed  number,  chosen  in  advance  and  "hard-wired"  into  the  type  system  and  into  the 
certifying  compilation  and  verification  machinery.  For  practical  purposes,  this  lack  of  flexibility  is 
likely  to  be  a  serious  problem.  The  correct  value  of  Y  for  optimal  performance  will  vary  from  one 
situation  to  the  next,  depending  on  the  cost  of  the  yield  operation  and  the  system-specific  timing 
requirements,  among  other  factors.  What  is  more,  the  optimal  Y  may  vary  over  time  even  for  the 
same  supervisor. 

Here  is  an  idea  for  a  flexible  solution:  each  time  the  program  yields,  let  the  supervisor  specify 
the  deadline  for  the  next  yield  by  placing  that  value  in  a  register  before  returning  control  to  the 
program.  The  program  can  then  load  this  value  into  a  clock  register  and  continue  with  a  minor 
yielding  strategy  more  or  less  exactly  as  described  in  Section  6.5.  Of  course,  it  will  not  do  to  allow 
the  supervisor  to  specify  any  number  it  chooses.  In  order  to  write  programs  that  are  safe  under  this 
new  policy,  it  must  be  possible  to  place  clock  checks  —  minor  yields  —  close  enough  together  that 
no  yield  deadline  will  ever  be  missed  no  matter  what  the  supervisor  does.  Unless  there  is  a  lower 
bound  on  the  inter-yield  times  the  supervisor  can  demand,  it  will  be  impossible  for  a  nontrivial 
program  to  satisfy  the  policy.  So  let  the  safety  policy  specify  a  minimum  maximum  yield  period,  Yq, 
that  is  large  enough  to  admit  reasonably  spaced  minor  yields. 

Syntactically,  we  change  the  yield  instruction  so  that  it  requires  a  destination.  The  typing  rule 

becomes:  _ 

(A,o:N);$;Th  d  :  S(a  +  Y0)  ->  T'  (A,  a:N);  T'{ck:a  +  y0}  h  I 

A;f  ;T  h  yield  d  I 

According  to  this  rule,  the  value  returned  by  the  modified  yield  instruction  is  the  representation 
of  the  value  to  which  the  virtual  clock  has  been  set.  This  number,  denoted  by  the  static  term  a  +  Yq, 
is  statically  unknown  but  is  clearly  at  least  Yq.  Furthermore,  assuming  Yq  >  L  +  3,  if  d  =  rck  then 
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we  have 

r'{ck:a  +  ho}  <  r{rck:5(a  +  ho  —  L  —  3  +  L  +  3),  ck:l  +  L  +  (2  +  a +  Yq  —  L  —  3)}. 

If  we  use  the  unchecked  singleton  subtraction  described  in  Section  3.3.4  to  subtract  L  +  3  from 
rck,  which  takes  one  instruction,  we  obtain  the  register  typing 

r{rck:<S(a  +  Y0  -  L  —  3),  ck:L  +  (2  +  (a  +  Y0  -  L  -  3))}  =  r{mcka+yo_L_3  i->  L}, 

i.e.,  we  have  set  the  minor  clock  to  L.  Moreover,  L  can  be  any  number  of  our  choosing  that  is 
less  than  or  equal  to  ho  —  3.  In  fact,  choosing  L  =  h o  —  3  may  sometimes  make  sense:  remember 
that  Y0  is  the  lower  limit  of  a  dynamically  varying  inter-yield  allowance,  and  so  it  is  probably 
much  smaller  than  the  Y  of  earlier  chapters.  Depending  on  the  application,  instances  where  the 
supervisor  returns  10 ho  or  even  lOOOTo  as  the  actual  deadline  may  be  common;  if  such  is  the  case 
there  may  be  little  benefit  in  checking  the  clock  much  more  often  than  every  ho  instructions. 


8.2  The  Engine  Abstraction 

The  engine  abstraction  is  an  approach  to  multitasking  and  preemption  popular  in  Scheme  pro¬ 
gramming.  An  engine  is  a  computation  that  can  be  executed  subject  to  a  time  limit,  which  may  or 
may  not  be  enough  for  it  to  finish.  This  time  limit  is  referred  to  as  the  amount  otfuel  given  to  the 
engine.  If  the  engine  finishes  its  computation  before  running  out  of  fuel,  it  returns  the  computed 
value;  if  it  does  not,  it  returns  a  new  engine  which,  if  invoked,  will  resume  the  computation  where 
the  old  one  left  off.  Haynes  and  Friedman  [34]  showed  how  to  implement  user-level  threads  us¬ 
ing  engines;  Dybvig  and  Hieb  [22]  have  shown  that  engines  may  in  turn  be  implemented  using 
call/cc  and  a  timer  interrupt.  Finally,  and  most  interestingly,  Dybvig's  Scheme  programming 
book  [21]  shows  how  to  implement  a  form  of  engines  without  the  help  of  a  system-provided  asyn¬ 
chronous  timer  interrupt.  (Sitaram's  online  text  [63]  provides  another  good  introduction  to  en¬ 
gines  for  novice  Scheme  programmers.) 

Dybvig's  interrupt-free  engine  implementation  is  less  satisfying  than  the  alternatives  from  a 
pragmatic  standpoint,  in  that  the  code  executed  by  an  engine  is  responsible  for  decrementing  a 
timer  periodically  to  track  its  consumption  of  fuel.  (Thus,  although  Dybvig  advertises  engines  as 
an  abstraction  of  "timed  preemption,"  the  implementation  he  provides  is  not  preemptive  at  all 
and  may  fail  to  work  properly  if  engine  code  is  not  written  in  a  certain  way.)  However,  there  are 
easy  parallels  to  draw  between  the  code  run  by  an  engine  under  Dybvig's  system  and  that  of  a 
TALT-R-certified  program. 

The  code  to  be  executed  by  an  engine  is  given  as  a  thunk,  or  a  function  of  no  arguments.  If  that 
function  returns  normally,  its  return  value  is  the  engine's  result.  To  handle  the  case  of  running 
out  of  time,  Dybvig's  implementation  defines  a  global  variable  called  do- expire  whose  value 
is  a  function  to  be  called  by  the  engine's  code  upon  discovering  it  has  run  out  of  fuel.  In  turn, 
do-expire  uses  call/cc  to  create  a  new  engine  which,  if  invoked,  will  cause  do-expire  to 
return  the  amount  of  additional  fuel  provided  to  continue  the  computation;  this  new  engine  is  then 
passed  to  the  continuation  of  the  engine  invocation,  returning  control  to  the  client.  In  other  words, 
do-expire  suspends  the  execution  of  the  engine  indefinitely  and  (if  it  returns  at  all)  returns  the 
number  of  "ticks"  until  it  must  suspend  again.  This  is  exactly  the  same  as  the  behavior  of  the 
adaptive  yield  instruction  in  Section  8.1.  What  is  more,  the  function  responsible  for  calling 
do-expire  (called  decrement-timer)  is  precisely  analogous  to  the  adaptive  minor  yield:  it 
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attempts  to  decrement  the  amount  of  fuel  remaining,  and  if  this  reaches  zero,  it  calls  do-expire 
and  resets  the  fuel  level  to  the  value  thus  obtained. 

One  can  therefore  think  of  Talt-R  as  a  type  system  for  writing  cooperative  engines  —  if  yield 
is  implemented  as  a  call  to  do-expire,  then  any  well-typed  Talt-R  program  (written  using  the 
adaptive  yield)  describes  an  engine  that  is  guaranteed  to  behave  well  under  a  non-preemptive 
implementation  of  the  engine  mechanism. 


8.3  Running  Time 

In  contrast  to  most  previous  work  on  certification  of  time  bounds,  the  bulk  of  this  thesis  has  been 
devoted  to  a  policy  to  which  any  program,  appropriately  compiled,  can  be  made  to  conform.  It 
is  only  a  matter  of  inserting  enough  yields,  and  I  have  assumed  that  the  yield  instruction  has 
no  observable  effect  from  the  certified  program's  point  of  view.  This  assumption  is  more  or  less 
consistent  with  an  applet-like  or  mobile  agent-like  model  in  which  the  untrusted  code  is  executed 
by  a  supervisor  or  host  on  behalf  of  some  other  party.  The  relationship  between  an  operating 
system  kernel  and  most  of  the  user  processes  running  under  it  is  similar  in  that  the  supervisor  is 
not  interested  in  the  correctness,  or  even  the  performance,  of  the  subordinate  processes,  and  the 
safety  policy  exists  to  isolate  processes  from  one  another  rather  than  to  govern  any  kind  of  critical 
interaction. 

In  order  to  conduct  a  meaningful  discussion  of  applications  that  require  certified  bounds  on 
the  total  running  time  of  a  program  or  function,  it  is  necessary  to  distinguish  between  two  broad 
subclasses.  The  first  is  a  time-sensitive  client-server  model  in  which  the  consumer  is  a  host  that 
executes  untrusted  code  on  behalf  of  other  parties.  In  these  systems,  the  consumer  does  not  care 
about  the  results  of  the  untrusted  computation,  even  though  it  may  care  about  the  time  it  takes  to 
compute  them.  The  second  class  is  a  plugin-like  model  in  which  the  supervisor  (perhaps  an  OS 
kernel)  uses  some  untrusted  code  to  perform  a  useful  function  (perhaps  a  device  driver  or  packet 
filter).  It  may  well  be  important  for  reliability  that  the  routines  exported  by  the  untrusted  plugin 
produce  meaningful  results  or  effects  within  a  certain  amount  of  time,  and  it  is  therefore  legitimate 
to  include  such  requirements  in  the  safety  policy  that  plugins  are  expected  to  obey. 

If  we  remove  the  yield  instruction  from  Talt-R,  then  we  can  certainly  devise  types  for  func¬ 
tions  that  capture  the  timing  policy  in  either  of  these  two  classes  of  application.  For  instance,  the 
TALres-like  code  type 

Va:N.Vp:TD.  {eax:B4,  esp:({eax:B4,  esp:p,  ck:a}  — >  0)  X  p,  ck :k  +  a}  — >  0 

(previously  encountered  in  Chapter  6's  discussion  of  Feeley  yielding)  describes  a  function  that 
takes  at  most  k  instructions — and  indeed,  without  the  yield  instruction,  any  function  with  this 
type  will  obey  a  very  strict  time  bound. 

The  problem  is  that  the  somewhat  impoverished  logic  available  for  clock  reasoning  in  TALT-R 
cannot  give  this  type  to  any  function  that  contains  any  loops,  recursion,  or  other  nontrivial  control 
flow.  In  fact,  Talt-R  as  described  seems  much  less  suited  to  this  kind  of  policy  than  TALres, 
which  includes  a  good  deal  more  abstract  reasoning  power  for  proving  interesting  time  bounds. 
An  obvious  way  to  remedy  this  shortcoming  of  Talt-R  is  to  endow  it  with  a  more  powerful  logic, 
perhaps  similar  to  that  of  TALres,  and  in  fact,  the  version  of  Talt  implemented  by  Crary  includes 
some  LX-like  features  similar  to  those  that  give  TALres  its  power,  which  might  go  a  long  way 
in  this  direction.  It  is  unknown  at  this  time,  however,  whether  or  not  "LX-ified"  Talt-R  would 
provide  a  scalable  solution. 
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Interestingly,  major  additions  are  not  necessary  in  the  case  of  the  client-server  model.  Let  Talt- 
R  be  altered  such  that  the  halt  instruction  may  be  executed  at  any  time,  regardless  of  context. 
(The  official  version  of  halt  may  only  be  performed  when  the  stack  is  empty.)  Then  replacing 
the  yield  instruction  in  the  minor  yield  with  a  halt  produces  a  code  fragment  with  the  same 
typing  properties  as  a  minor  yield.  Now,  if  we  compile  a  program  using  this  modified  minor 
yield  (or  "minor  halt"),  the  resulting  code  is  guaranteed  to  terminate  after  Y  instructions:  it  either 
terminates  normally,  as  the  programmer  intended,  or  it  runs  out  of  time  and  halts. 

From  the  consumer's  point  of  view,  this  is  exactly  what  was  required.  Things  are  a  little  more 
complicated  from  the  producer's  point  of  view.  The  best  results  from  this  side  are  obtained  by 
using  as  precise  a  method  of  "yield"  placement  as  possible,  perhaps  the  precise  yield-on-jump 
strategy  of  Section  6.5.  If  we  use  such  a  precise  strategy  and  ignore  the  overhead  of  instruction 
counting,  it  is  safe  to  say  that  if  the  original  program  would  have  finished  in  under  Y  instructions 
without  the  minor  halts,  then  the  program  compiled  with  minor  halts  will  finish  normally  and 
with  the  same  result.  In  other  words,  if  we  include  the  timing  requirement  among  the  criteria 
for  correctness  and  ignore  the  complications  of  overhead,  we  can  say  that  the  insertion  of  minor 
halts  does  not  affect  the  semantics  of  a  correct  program.  Of  course,  the  type  system  is  no  help 
when  it  comes  to  guaranteeing  correctness  —  but  certification  is  for  the  consumer's  benefit,  not  the 
producer's.  That  the  program  must  produce  a  useful  result  after  its  allotment  of  Y  instructions 
is  a  self-imposed  requirement  of  the  producer  and  is  therefore  irrelevant  to  certification.  If  the 
producer  can  come  up  with  a  correct  program,  and  satisfy  herself  of  its  correctness  by  any  means 
whatsoever,  then  it  can  be  certified  such  that  the  consumer  will  accept  and  run  it. 

Unfortunately,  the  success  of  this  idea  depends  on  the  availability  of  an  "escape  route"  through 
which  a  program  can  terminate  without  producing  meaningful  results.  It  may  not  always  be  pos¬ 
sible  to  provide  such  a  convenience:  for  instance,  in  plugin-like  systems  with  hard  real-time  con¬ 
straints,  it  can  be  safety-critical  that  the  untrusted  code  not  merely  terminate  within  the  allotted 
time  but  provide  a  useful  result.  In  systems  of  this  kind,  the  problem  of  certifying  that  a  compu¬ 
tation  produces  its  results  correctly  and  on  time  remains  as  difficult  as  ever. 


8.4  Virtual  Versus  Real  Clocks 

All  the  safety  policies  discussed  so  far  measure  elapsed  time  by  counting  instructions.  For  this 
to  make  sense,  it  has  been  necessary  to  make  the  tacit  assumption  that  a  reasonable  upper  bound 
can  be  placed  on  the  time  any  instruction  takes  to  execute,  so  that  multiplying  this  quantity  by  the 
number  of  instructions  in  a  sequence  may  be  assumed  to  provide  a  reasonable  upper  bound  on  the 
execution  time  of  the  sequence.  Perhaps  at  one  time  in  the  history  of  digital  computers  that  may 
have  been  true,  but  it  represents  a  tremendous  oversimplification  of  the  behavior  of  present-day 
desktop  and  server  microprocessors.  The  actual  execution  time  of  a  memory  read,  for  example, 
depends  upon  the  state  of  the  memory  hierarchy  and  can  vary  over  several  orders  of  magnitude. 
As  a  result,  it  is  far  from  simple  to  produce  conservative  static  estimates  of  running  time  that  are 
precise  enough  to  be  useful. 

There  are  two  issues  that  must  be  dealt  with  in  order  to  develop  a  real-time  version  of  Talt-R. 
First,  certain  instructions  and  certain  combinations  of  operands  take  more  cycles  to  compute  than 
others;  and  second,  the  execution  time  of  any  given  instruction  is  highly  variable  (which  means  it 
is  usually  much  less  than  its  maximum  value).  These  issues  are  independent  and  can  be  addressed 
separately. 
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8.4.1  Unpredictability 

Because  it  is  so  difficult  to  make  usefully  precise  conservative  static  predictions  of  execution  time, 
it  simplifies  matters  to  treat  time  as  utterly  unpredictable,  except  for  very  loose  upper  bounds 
known  in  advance.  In  order  to  make  sense  of  this,  it  is  necessary  to  distinguish  between  the  TALT- 
r  abstract  machine  and  the  concrete  machine  (something  like  an  Intel  Pentium)  on  which  programs 
will  actually  be  run.  For  the  time  being,  I  continue  to  assume  the  existence  of  a  single  upper 
bound  on  the  execution  time  of  any  concrete  machine  instruction;  this  amount  of  time  is  the  virtual 
clock  unit  (or  vcu).  It  follows  that  any  sequence  of  instructions  during  which  the  virtual  clock 
of  the  abstract  machine  decreases  from  Y  to  zero  takes  the  concrete  machine  at  most  Y  vcu  to 
execute  —  in  fact,  it  will  usually  be  much  faster  than  that.  If  the  desired  safety  policy  is  that  the 
concrete  machine  yield  at  least  once  every  Y  vcu,  then  the  expression  of  that  safety  policy  as  the 
requirement  that  the  virtual  clock  remain  nonnegative  is  overly  conservative.  Most  of  the  time, 
when  the  virtual  clock  reaches  zero,  a  concrete  machine  will  still  have  time  to  perform  many  more 
instructions  before  yielding.  The  bad  news  is  that,  by  assumption,  we  cannot  statically  make 
any  better  predictions  than  the  very  loose  bound  of  one  vcu  per  concrete  instruction.  We  can, 
however,  endow  the  abstract  machine  with  the  means  to  discover  the  availability  of  more  cycles 
after  its  virtual  clock  has  run  out. 

Fortunately,  most  modern  computers  and  embedded  microcontrollers  possess  a  reliable  way 
to  measure  time.  For  instance,  the  Intel  x86  family  of  processors,  starting  with  the  Pentium,  have 
a  rdtsc  ("read  timestamp  counter")  instruction  that  produces  the  64-bit  number  of  clock  cycles 
that  have  elapsed  since  the  processor  was  last  reset  [38,  Vol.  2,  p.  3-604],  (Unlike  the  virtual  clock 
of  the  Talt-R  abstract  machine,  the  cycles  of  this  clock  all  take  the  same  amount  of  real  time.) 
Even  in  the  absence  of  certification,  a  program  that  needs  to  observe  a  strict  time  limit  can  take 
advantage  of  features  like  this  one  to  monitor  its  own  progress.  It  is  possible  to  extend  the  type 
system  of  Talt-R  to  certify  the  correctness  of  such  techniques,  guaranteeing  that  programs  obey 
the  policy  even  if  conformance  depends  on  the  clock-checking  behavior  in  a  critical  way. 

Let  Talt-R  be  extended  with  a  new  compound  instruction  check  d  that  computes  the  real 
time  remaining  until  the  program's  next  deadline  (whether  for  termination  or  yielding),  stores  the 
result  (in  virtual  clock  units)  in  destination  d,  and  resets  the  virtual  clock  to  this  quantity.  Since 
check  is  an  abstract  machine  instruction  implemented  by  a  sequence  of  more  than  one  concrete 
machine  instruction,  the  amount  by  which  it  decrements  the  virtual  clock  will  be  greater  than  one; 
call  this  number  ccheck-  (In  other  words,  let  ccheck  be  the  maximum  number  of  vcus  it  can  take  to 
execute  a  check  instruction  on  a  concrete  machine.)  The  following  typing  rule  describes  this  new 
instruction: 

(F(ck)  Ccheck  T  t) 

(A,  a:N);  $;T  h  d  :  S(a  +  t)  -*■  V  (A, a:N);  T'{ck:a  +  t}  h  I 

A; T; T  F  check  d  I 

This  rule  is  very  similar  to  the  one  for  the  adaptive  yield  in  Section  8.1,  in  that  it  connects  the 
integer  value  returned  by  the  instruction  to  the  value  of  the  virtual  clock.  Indeed,  from  the  point 
of  view  of  the  abstract  machine,  check  has  almost  exactly  the  same  semantics  as  the  adaptive 
yield:  it  resets  the  virtual  clock  to  an  unpredictable  quantity  and  stores  that  quantity  in  the  given 
destination.  In  this  case,  however,  note  that  there  must  be  enough  time  on  the  virtual  clock  to 
perform  the  check  (it  is  not  a  yield,  after  all,  so  its  cost  must  be  counted).  Since  the  virtual  clock 
is  known  to  read  at  least  ccheck  +  t  before  this  instruction,  and  it  costs  at  most  ccheck/  it  is  safe  to 
assume  that  it  reads  at  least  t  afterward. 

If  the  check  operation  is  very  cheap,  it  makes  sense  to  do  away  with  the  use  of  a  "clock  reg- 
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ister"  and  rely  solely  on  the  built-in  real  clock  to  keep  track  of  time.  Figure  8.1  shows  a  code 
fragment  that  implements  a  dock  check:  assuming  it  is  executed  with  at  least  ccheck  +  2  virtual  clock 
cycles  remaining,  CHECK  ensures  that  when  control  reaches  the  label  end  there  is  enough  time  on 
the  clock  to  perform  L  instructions  plus  another  CHECK.  The  implementation  is  analogous  to  the 
minor  yield,  except  that  the  result  of  the  check  instruction  is  not  stored  in  a  "clock  register."  If 
that  result  is  too  small  to  satisfy  the  postcondition  without  yielding,  a  yield  is  performed. 


CHECK  = 

II  ck  :  Ccheck  T  2 
check  eax 

//  a:N,  eax  :S(a  +  2),  ck:a  +  2 
subjae  eax,  eax,  (L  +  cCheCk  +  2),  end 
ck :  a 
yield 

/  /  Q  I  *  Y  L  Ccheck  2 
/  /  ck  :  I  =  L  +  (ccheck  +  2  +  a' ) 

end : 

//  a'  :  N,  ck  :  L  +  (cCheck  +  2)  +  a/ 

/ /  hence  ck :  L  +  (ccheck  +  2) 

_ Figure  8.1:  Code  for  a  Clock  Check 


8.4.2  Better  Static  Approximations 

It  is  probably  not  necessary  to  assume  that  the  virtual  clock  unit  is  the  best  statically  available 
upper  bound  on  the  execution  time  of  any  instruction.  Some  instructions  are  faster  than  others: 
a  register-to-register  mov  instruction,  for  instance,  probably  has  a  smaller  range  of  possible  run¬ 
ning  times  than  an  arithmetic  instruction  whose  destination  is  a  memory  location.  It  is  possible 
to  take  advantage  of  this  knowledge  simply  by  discarding  the  assumption  that  each  and  every 
concrete  instruction  corresponds  to  exactly  one  tick  of  the  abstract  machine's  virtual  clock.  In 
other  words,  we  can  redefine  the  virtual  clock  unit  to  be  any  amount  of  time  we  choose  (perhaps 
most  conveniently,  we  can  set  it  equal  to  one  cycle  of  the  concrete  hardware's  clock)  and  assign 
different  virtual  costs  to  different  abstract  machine  instructions,  or  even  to  different  combinations 
of  operands. 

To  be  more  precise,  we  can  assign  to  each  Talt-R  instruction  i  a  virtual  cost  c, .  Executing  an 
i  instruction  decrements  the  virtual  clock  by  c,  and  takes  at  most  c,  vcu  on  a  concrete  machine; 
the  timing  conditions  in  the  instruction  typing  rules  must  be  adjusted  accordingly.  All  of  the  yield 
placement  strategies  described  in  Chapter  6  still  work,  keeping  in  mind  that  not  all  instructions 
have  the  same  cost.  It  is  still  possible  to  use  a  clock  register,  but  of  course  all  the  constants  in  the 
definitions  of  the  minor  clock  and  minor  yield  must  be  adjusted. 


8.5  Bandwidth 

In  client-server  or  mobile  agent  systems,  safety  policies  must  often  address  the  consumption  of 
host  resources  other  than  time.  In  particular,  if  foreign  code  is  permitted  to  access  the  network  or 
file  system,  the  host  may  be  concerned  about  denial-of-service  attacks  based  on  excessive  use  of 
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these  resources.  The  relevant  policy  in  such  situations  may  be  one  of  bandwidth  limiting,  which 
can  be  accomplished  by  setting  a  minimum  time  that  must  elapse  between  calls  to  certain  resource- 
related  operations  (such  as  disk  reads  or  network  sends).  "Time"  might  mean  real  time,  as  in  the 
previous  section,  or  might  be  measured  in  instructions  as  in  most  of  this  thesis. 

A  bandwidth-limiting  policy,  expressed  as  a  lower  bound  on  the  time  between  events,  is  es¬ 
sentially  the  dual  of  the  responsiveness  policy  of  Talt-R  which  is  expressed  as  an  upper  bound. 
This  suggests  that  a  few  minor  changes  to  the  type  system  might  be  sufficient  to  turn  Talt-R  into 
a  theory  for  bandwidth  certification.  Where  Talt-R  had  a  designated  "yielding"  operation,  let  the 
bandwidth-certifying  theory  have  a  designated  "consuming"  operation;  instead  of  a  maximum 
yield  period  Y,  specify  a  minimum  "rest  period"  I!  that  must  elapse  between  consuming  opera¬ 
tions;  and  last  but  not  least,  in  the  register  file  subtyping  rule,  reverse  the  sense  of  the  inequality 
constraint  on  the  clock,  thus: 


Ah  Tr  <t't  for  each  register  r 
A  b  {eax:rax, . . . ,  ebp:rbp,  esp:rsp,  ck:t}  <  {eax:r'x, . . . ,  ebp:rbp,  esp:r'p,  ck :t'} 

In  this  modified  system,  the  virtual  clock  represents  the  amount  of  time  that  must  pass  before 
a  consume  operation  is  safe.  Since  it  is  safe  to  wait  longer,  this  virtual  clock  decrements  with 
every  instruction  until  it  reaches  zero,  then  (rather  than  getting  stuck)  remains  zero  until  it  is  reset 
to  R  by  a  consume  operation.  As  in  Talt-R,  the  ck  term  in  a  register  file  type  is  a  conservative 
approximation  of  the  virtual  clock  (but  "conservative"  here  means  the  opposite  of  what  it  means 
in  Talt-R):  because  of  the  constraint  premise  in  the  rule  above,  a  register  typing  T  describes  a 
machine  state  in  which  the  virtual  clock  is  at  most  T(ck). 

Responsiveness  involves  an  upper  bound  on  elapsed  time  and  thus  requires  conservatively 
overestimating  the  time  on  the  clock;  bandwidth  involves  a  lower  bound  and  requires  conserva¬ 
tively  underestimating  the  clock.  A  tempting  question  is  whether  it  is  possible  to  accomodate 
a  policy  that  places  both  an  upper  and  a  lower  bound  on  the  time  between  successive  events. 
In  order  to  accomplish  this  we  must  no  longer  allow  unrestricted  over-  or  underestimation;  the 
subtyping  rule  must  not  allow  any  variance  in  the  clock  term: 

A  b  rr  <  t't  for  each  register  r 
A  b  {eax:rax, . . . ,  ebp:rbp,  esp:rsp,  ck:t}  <  {eax:r'x, . . . ,  ebp:rbp,  esp:r'p,  ck :t'} 

(For  convenience,  rewriting  of  the  clock  term  is  still  allowed,  but  the  old  and  new  terms  must 
denote  the  same  number.)  All  imprecision  in  static  clock  reasoning  must  now  be  accounted  for 
explicitly  using  guarded  types  and  singleton  arithmetic.  I  have  not  investigated  the  implications 
of  such  policies  for  compilation  of  timing-ignorant  programs. 


A  b  t  =  t'  true 


A  b  t  <  t'  true 


8.6  Stack 

Ordinary  Talt  does  not  provide  any  protection  against  stack  overflow.  The  implementation  relies 
on  the  operating  system  (more  specifically,  the  virtual  memory  system)  to  detect  excessive  stack 
allocation  and  terminate  the  program,  and  the  safety  policy  reflects  this  by  specifying  that  any 
stack-growing  instruction  may  fail  and  send  the  machine  to  the  "halt"  configuration.  It  would  be 
nice  to  have  a  type  system  that  rules  out  stack  overflows,  so  that  neither  the  safety  policy  nor  the 
runtime  system  would  need  to  account  for  the  possibility  of  that  error. 
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(r(fss)  =  n  +  t) 

A;  4';  F{esp  :  nsn  x  r(esp),  f  ss:t}  F  I 
A;f;Fh  salloc  n  I 


A  b  T(esp)  <  ri  x  72  A  b  n  :  Tn 
A  b  T2  :  TD  A;  ’I';  F{esp:r2,  f  ss:n  +  T(f  ss)}  b  / 
A;  4q  T  b  sfree  n  I 


Figure  8.2:  Typing  Rules  for  a  Stack  Usage  Policy 


We  can  adapt  Talt-R  to  monitor  stack  space  usage  if  we  replace  the  virtual  clock  with  a  virtual 
counter  representing  the  number  of  words  of  available  (unused)  stack  segment  memory.  Con¬ 
cretely,  we  change  the  name  of  the  ck  pseudoregister  to  f  ss  ("free  stack  space"),  so  that  register 
file  types  look  like 

T  ::=  {eax:rax, . . . ,  ebp:rbp,  esp:r,  f  ss:t} 

One  key  difference  between  space  and  time  is  that  not  every  instruction  consumes  any  stack 
space;  in  this  case,  it  is  only  the  stack-related  instructions  call,  push  and  salloc  that  cause 
the  stack  to  grow  and  hence  should  cause  f  ss  to  decrease.  Another  important  difference  between 
space  and  time  is  that  the  stack  space  "consumed"  by  one  of  the  instructions  just  mentioned  can  be 
recovered  by  a  ret,  pop  or  s  f  r ee  instruction.  The  typing  rules  for  non-stack-related  instructions 
are  exactly  the  same  as  in  ordinary  TALT;  the  rules  for  stack-related  instructions  are  shown  in 
Figure  8.2. 

It  is  sound  to  consider  a  register  file  type  T  to  be  a  subtype  of  T'  if  r'(fss)  <  T(fss).  Not 
surprisingly,  this  is  the  same  as  the  rule  for  time  in  a  responsiveness  policy:  it  is  always  safe  to 
forget  about  some  of  an  available  resource. 

Clearly,  in  order  for  this  static  tracking  of  stack  availability  to  work,  there  must  be  some  equiv¬ 
alent  of  a  minor  yield:  it  must  be  possible  for  a  running  program  to  determine  how  much  stack 
space  remains  in  order  to  know  whether  it  has  run  out.  There  can  be  no  precise  equivalent  of  a 
yield,  since  the  machine  does  not  have  an  unlimited  amount  of  virtual  memory  or  an  unbounded 
address  space  and  hence  one  cannot  always  make  room  for  more  stack  allocation.  Something  anal¬ 
ogous  to  the  check  instruction  from  Section  8.4,  however,  does  make  sense:  we  can  have  a  new 
instruction,  call  it  ss  check  ("stack  segment  check"),  that  returns  the  number  of  bytes  remaining 
below  the  stack  pointer.  A  sequence  of  instructions  similar  to  the  clock  check  in  Figure  8.1  can 
compare  this  number  to  the  amount  the  program  wishes  to  allocate,  and  halt  (or  otherwise  es¬ 
cape)  if  it  is  too  small.  The  sscheck  instruction  itself  is  very  simple  to  implement:  it  only  needs 
to  subtract  the  address  of  the  beginning  of  the  stack  segment  from  the  address  stored  in  the  stack 
pointer  register. 


8.7  Heap  Allocation 

Copying  garbage  collectors  typically  support  allocation  of  memory  by  providing  programs  with 
an  allocation  area  in  which  the  mutator  can  write  new  objects.  When  this  region  of  memory  is  full, 
the  collector  is  notified  and,  after  some  work,  returns  a  pointer  to  a  new,  empty  allocation  area. 
Since  new  objects  are  written  into  a  contiguous  block  of  memory,  there  is  no  need  to  consult  or 
modify  a  "free  list"  with  each  allocation,  making  this  interface  very  efficient  for  languages  and 
programs  that  create  new  heap  objects  frequently. 

In  noncertified  systems  that  use  this  protocol,  the  mutator  keeps  track  of  two  pointer  values  to 
control  allocation:  the  allocation  pointer  (ap),  which  points  to  the  first  unused  word  in  the  allocation 
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A;f;rho:  S(t ) 

A;  iH;  T{f  aa:i}  b  /  (r(f  aa)  =  n  +  f)  A;  'l';  T{r<pnsW,  f  aa:t}  F  I  inits  r^:mbox(nsn) 
Aj'TjTbgco/  A;  IF;  T  b  malloc  r^,  n  I 

Figure  8.3:  Typing  Rules  for  a  Heap  Allocation  Policy 


area,  and  the  limit  pointer  (lp),  the  address  of  the  end  of  the  allocation  area.  At  any  given  time, 
there  are  lp  —  ap  bytes  available  for  allocation.  To  make  space  for  an  n-byte  object,  the  mutator 
must  first  make  sure  that  lp  —  ap  <  n.  If  this  is  not  the  case,  then  the  garbage  collector  is  called, 
and  returns  two  new  pointers  defining  a  new  allocation  area  that  is  guaranteed  to  be  big  enough. 
The  mutator  saves  the  value  of  ap,  which  will  be  the  address  of  the  new  object,  and  updates  ap  to 
ap  +  n. 

If  a  program  must  allocate  many  objects  in  rapid  succession,  it  is  wasteful  to  perform  the 
comparison  between  ap  and  lp  for  every  object;  compilers  therefore  attempt  to  coalesce  these 
operations,  checking  once  to  ensure  the  availability  of  space  for  several  objects.  This  makes  code 
shorter,  saves  time,  and  cuts  down  the  number  of  code  points  from  which  the  collector  may  be 
called,  which  can  be  helpful  for  tag-free  collectors  that  must  be  able  to  parse  the  stack  at  every 
such  point  [68].  On  the  other  hand,  this  practice  requires  a  more  complex  safety  policy  than  the 
malloc  pseudoinstruction  of  Talt  and  Talt-R. 

Creation  of  heap  objects  in  a  contiguous  arena  is  in  many  ways  analogous  to  stack  allocation, 
and  a  type  system  similar  to  the  one  for  stack  segment  management  in  the  previous  section  can 
also  capture  the  idiom  of  coalesced  allocation  pointer  checks.  Instead  of  ck  or  f  ss,  let  the  register 
file  type  specify  a  term  for  faa  ("free  allocation  area"),  the  number  of  unused  bytes  left  in  the 
allocation  area.  The  two  important  instructions  for  manipulating  the  allocation  area  are  gc,  which 
causes  the  creation  of  a  fresh  allocation  area  of  the  requested  size,  and  malloc.  Typing  rules  for 
these  instructions  under  such  a  policy  are  shown  in  Figure  8.3.  The  rule  for  gc  is  analogous  to 
Talt-r's  rule  for  yield:  it  has  no  observable  effect  but  to  reset  the  amount  of  free  space  to  the 
quantity  specified  by  the  operand  (terminating  the  program  if  this  is  not  possible).  The  malloc 
rule  is  essentially  the  same  as  in  Talt  or  Talt-R,1  except  that  it  consumes  n  bytes  when  allocating 
an  object  of  size  n. 

Petersen  et  al.  [58]  have  described  a  type  theory  for  memory  allocation  based  on  ordered  linear 
logic.  In  their  calculus,  called  Aord,  object  creation  is  a  three-step  process:  reservation  creates  a 
block  of  uninitialized  memory,  which  then  undergoes  initialization,  and  allocation  makes  the  new 
object  available  for  use  by  the  program.  The  object-creation  area  of  the  heap  is  divided  into  three 
parts:  the  free  space  that  has  not  been  touched  in  any  way,  the  frontier,  which  is  space  that  has  been 
reserved  but  not  allocated,  and  the  portion  that  contains  allocated  objects.  The  frontier  is  treated 
specially  by  the  type  system,  in  that  it  allows  updates  that  change  the  types  of  locations  (from  n  s 
to  useful  types);  the  ordered  linear  typing  discipline  applied  to  the  frontier  prevents  aliasing  so 
these  updates  can  be  sound. 

Petersen  et  al.  treat  the  reservation  step  as  a  primitive  that  ensures  that  the  frontier  has  the 
requested  size,  calling  the  garbage  collector  if  necessary.  To  be  more  precise,  it  compares  ap  (which 
points  to  the  boundary  between  used  space  and  the  frontier)  to  lp  (which  points  to  the  end  of  the 

1Although  this  rule  is  not  discussed  at  all  in  this  thesis  -  see  Crary's  Talt  papers  [13, 14]. 
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free  space)  and  notifies  the  collector  if  the  difference  is  less  than  the  requested  size  n.  Having 
made  sure  there  is  enough  free  space  to  accomodate  the  request,  it  then  "relabels"  the  first  n  bytes 
following  ap  as  the  new  frontier;  subsequent  instructions  can  perform  initialization  in  this  region. 
A  TALT-R-like  system  as  described  above  could  expose  even  more  of  the  fine  structure  of  allocation 
by  separating  the  limit  check  (analogous  to  a  minor  yield  or  to  sscheck),  the  call  to  the  garbage 
collector  (the  gc  instruction  just  described),  and  the  creation  of  a  frontier. 


8.8  Chapter  Summary 

Although  most  of  this  thesis  has  been  focused  on  Talt-R,  a  type  system  for  certifying  conformance 
to  a  specific  responsiveness  policy,  small  adjustments  to  this  system  suffice  to  enable  certification 
of  a  wide  variety  of  resource  management  policies,  both  timing-related  and  not.  Some  of  these 
changes  address  practical  shortcomings  of  TALT-R,  such  as  the  need  to  commit  to  a  specific  yield 
latency  as  part  of  the  safety  policy  or  the  imprecision  of  instruction-counting  as  a  measurement 
of  time.  Others  point  the  way  to  certifying  bounded  running  time  in  client-server  applications, 
bandwidth  limits,  stack  usage  and  proper  interaction  with  a  garbage  collector. 


Chapter  9 

Conclusions 


In  this  chapter,  I  present  the  results  of  some  performance  measurements  of  my  compiler  and  dis¬ 
cuss  some  of  their  implications.  I  go  on  to  mention  some  avenues  for  future  research  in  this  area. 
Finally,  I  present  my  overall  conclusions. 


9.1  Performance  Evaluation 

As  a  preliminary  performance  experiment,  I  measured  the  effects  of  my  yielding  strategies  on 
four  different  programs.  The  benchmarks  range  in  complexity  and  qualitative  behavior:  msort 
applies  a  polymorphic  merge-sort  procedure  to  a  pseudorandomly-generated  linked  list  of  inte¬ 
gers;  qsort  applies  quicksort  to  an  array;  comb  computes  a  row  of  Pascal's  triangle;  and  tempo 
is  a  port  of  the  grid-based  chess  player  developed  by  the  ConCert  project.  Figure  9.1  shows  the 
impact  on  execution  time:  for  each  benchmark  the  graph  shows  the  execution  time  using  Feeley 
yielding  and  Feeley  polling,  normalized  with  respect  to  the  running  time  in  ordinary  TALT  with 
no  yielding  requirements.  The  test  programs  were  linked  against  a  version  of  the  runtime  system 
in  which  the  yield  operation  does  nothing  other  than  count  the  number  of  times  it  is  called;  thus 
the  increases  in  running  time  are  due  only  to  function  call  overhead  and/ or  clock  register  opera¬ 
tions.  All  timing  experiments  were  performed  on  a  730  MHz  Pentium  III  desktop  with  384  MB  of 
RAM  running  Linux.  As  the  chart  shows,  Feeley  yielding  slowed  down  programs  by  up  to  67%, 
while  Feeley  polling  never  altered  execution  time  by  more  than  6%  in  this  experiment. 


msort  qsort  comb  tempo 


Figure  9.1:  Normalized  execution  time  (Y  =  1  billion,  L  =  500,  E  =  100,  H  =  50) 
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These  speed  measurements  clearly  show  that  pure  Feeley  yielding  without  dynamic  instruc¬ 
tion  counting  is  a  losing  strategy,  and  they  seem  to  show  that  Feeley  polling  has  a  reasonably  small 
effect  on  performance.  This  latter  result,  however,  should  be  taken  with  a  grain  of  salt,  as  there  are 
many  confounding  factors.  First  of  all,  it  is  not  fair  to  compare  execution  times  between  programs 
that  use  any  particular  yielding  strategy  and  programs  that  perform  no  yields  at  all.  The  "base" 
version  of  each  microbenchmark  against  which  the  others  were  compared  is  not  safe  with  respect 
to  the  Talt-r  yielding  policy.  Some  of  the  difference  between  the  base  speed  and  the  speed  with 
Feeley  polling  is  presumably  "the  price  you  pay  for  safety"  and  cannot  be  eliminated  in  any  safe 
version  of  the  program.  On  the  other  hand,  all  three  versions  of  each  program  were  produced 
by  essentially  the  same  compiler,  and  that  compiler  was  a  very  naive  prototype  that  performed 
essentially  no  optimization  and  stored  all  temporary  results  on  the  stack.  Since  it  did  not  do  even 
the  most  basic  kind  of  register  allocation,  it  was  completely  insensitive  to  the  increase  in  register 
pressure  that  dynamic  instruction  counting  should  have  created.  In  other  words,  improving  the 
code  quality  of  the  compiler  may  well  have  a  greater  impact  on  the  non-yielding  program  than  on 
the  Feeley  polling  program,  revealing  my  observed  6%  differences  as  artificially  small. 

An  important  issue  faced  by  the  implementation  but 
not  apparent  in  my  discussion  of  MiniTALT-R  up  to  now 
is  the  timing  behavior  of  Talt-r's  malloc  instruction, 
which  allocates  space  in  a  garbage-collected  heap.  It  is 
difficult  to  predict  how  long  an  invocation  of  malloc 
will  take:  those  that  trigger  a  garbage  collection  run  much 
longer  than  those  that  do  not.  For  my  initial  experi- 
Table  9.1:  Yields  under  Feeley  Polling  ments  ,  assumed  that  the  runtime  system  would  con¬ 
servatively  yield  at  every  allocation.  Table  9.1  shows  that  for  msort  and  tempo,  which  do  a  lot 
of  allocation,  these  "implicit  yields"  dominate  the  "explicit  yields"  introduced  by  the  compiler: 
msort  performs  some  1.5  million  yielding  operations  per  second  on  average,  only  0.9  of  which 
are  yield  instructions.  The  qsort  and  comb  benchmarks  were  carefully  written  to  allocate  as 
little  as  possible,  and  perform  only  a  constant  number  of  implicit  yields  per  run. 

These  results  clearly  indicate  a  need  for  a  better  treatment  of  allocation.  One  possibility  is  to 
provide  a  version  of  malloc  that  has  access  to  the  program's  clock  register  and  yields  only  when 
needed.  This  "smart  malloc"  poses  no  problems  in  principle,  but  the  implementation  effort 
required  is  nontrivial  and  I  have  not  attempted  it.  To  estimate  the  performance  improvement,  I 
modified  our  implementation  to  assume  a  fixed  cost  for  malloc  (to  simulate  fast,  non-collecting 
allocations)  and  instrumented  the  runtime  system  to  count  the  number  of  garbage  collections, 
which  presumably  would  still  have  to  yield.  Table  9.2  shows  the  estimated  yield  rate  for  the 
smart  malloc  along  with  the  rates  we  measured  for  Feeley  yielding  and  polling.  In  all  cases,  the 
smart  malloc  reduces  the  total  yield  rate  to  less  than  one  hundred  yields  per  second.  All  of  the 
remaining  experiments  discussed  in  this  chapter  use  the  simulated  smart  malloc. 

FY  FP  FPsm  (est.) 

msort  3.5  X  10(i  1.5  X  10°  6.3  x  10u 

qsort  1.8  X  107  4.4  X  101  2.3  x  101 

comb  1.7  x  107  1.4  x  101  1.3  x  101 

tempo  6.6  x  106  2.1  x  105  6.4  x  101 


Table  9.2:  Yield  Frequencies  (Yields/sec) 
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Feeley  Yielding  (E=100,H=50) 


Feeley  Polling  (L=500,E=100,H=50) 
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Figure  9.2:  Effect  of  Y  on  Yielding  Performance 


Next,  I  looked  at  the  effect  of  the  policy  yield  period  Y  on  the  observed  yield  rate  and  running 
time  of  the  benchmark  programs.  Each  program  was  compiled  for  three  different  safety  policies, 
with  Y  equal  to  10  million,  100  million  and  one  billion,  assuming  a  smart  malloc  with  a  non¬ 
collecting  allocation  cost  of  400  instructions.  Figure  9.2  shows  the  results.  The  graphs  on  the  left 
are  for  Feeley  yielding,  and  those  on  the  right  are  for  Feeley  polling.  The  yield  frequency  graphs 
show  "base"  values  obtained  by  counting  the  number  of  garbage  collections  performed  by  non¬ 
yielding  versions  of  each  program  and  dividing  by  the  elapsed  time;  since  garbage  collections  are 
counted  as  implicit  yields,  this  gives  an  idea  of  the  amount  of  programmatically  necessary  yielding 
in  each  benchmark.  As  the  figure  shows,  the  performance  of  Feeley  yielding  is  insensitive  to  the 
choice  of  Y.  In  fact,  for  each  program,  the  three  different  yielding  versions  performed  the  exact 
same  number  of  yields  per  run,  suggesting  that  the  compiler  generated  exactly  the  same  code 
regardless  of  Y.  This  is  not  surprising,  since  even  the  smallest  value  of  Y  tested  is  much,  much 
larger  than  the  longest  basic  block  length  in  any  of  the  programs.  For  Feeley  polling,  the  yield 
rate  of  the  programs  seemed  to  be  more  or  less  inversely  proportional  to  Y,  as  one  would  expect. 
The  overhead  introduced  by  Feeley  polling  was  also  greater  for  smaller  values  of  Y,  but  was  still 
never  measured  to  add  more  than  10%  to  the  running  time  of  any  benchmark. 

The  next  experiment,  whose  results  are  shown  in  Figure  9.3,  looked  at  the  effect  of  the  minor 
yield  period  L  on  yielding  performance.  As  before,  the  "base"  numbers  in  both  graphs  were 
obtained  by  measuring  the  running  time  of  non-yielding  versions  of  the  benchmarks  and  counting 
the  number  of  garbage  collections  they  performed.  The  total  yield  frequencies  and  normalized 
running  times  of  the  benchmarks  were  measured  for  versions  compiled  with  Feeley  polling  with 
minor  yield  periods  of  500,  700  and  1000  instructions.  The  impact  of  this  parameter  on  running 
time  is  small  and  does  not  exhibit  any  universal  trend;  however,  the  graphs  on  the  left  clearly  show 
that  yield  frequency  increases  as  L  increases.  This  effect  is  probably  a  different  manifestation  of 
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Effect  of  L  on  Yield  Frequency 


Minor  Yield  Period 


Effect  of  L  on  Running  Time 


Benchmark 


Figure  9.3:  Effect  of  L  on  Yielding  Performance  ( Y =100M,  £=100,  if =50) 


the  same  phenomenon  that  accounts  for  the  insensitivity  of  the  direct  Feeley  yielding  strategy  to 
changes  in  Y:  most  basic  blocks  in  most  programs  are  sufficiently  shorter  than  500  instructions 
that  the  compiler  places  minor  yields  at  exactly  the  same  program  points  for  each  of  these  values 
of  L,  so  the  same  number  of  minor  yields  occur  per  run  of  the  program.  Since  each  minor  yield 
decrements  the  clock  register  by  about  L,  larger  values  of  L  permit  fewer  minor  yields  per  major 
yield,  leading  to  a  proportional  increase  in  major  yields.  This  indicates  that  smaller  values  of  L 
are  better  than  large  values;  unfortunately,  it  would  have  been  very  inconvenient  to  test  values  of 
L  smaller  than  500,  since  the  assumed  cost  of  the  simulated  smart  malloc  was  400  instructions. 

Instead,  I  tested  the  hypothesis  that  the  minor-to-major  yield  ratio  was  the  determining  factor 
of  yield  frequency  by  hand-tuning  the  minor  yields  in  a  select  few  extremely  commonly  used 
functions  in  the  library  shared  by  all  TALT-R  programs.1  The  tuning  consisted  merely  of  adjusting 
the  quantity  subtracted  from  the  clock  register  in  each  minor  yield  so  that  it  corresponded  exactly 
to  the  requirements  of  the  ensuing  basic  block  (and  correcting  the  relevant  typing  annotations).  No 
minor  yields  were  added  or  removed,  and  no  other  changes  were  made  to  the  code.  Non-tuned 
portions  of  the  program  were  identical  to  those  compiled  with  L  =  500.  The  performance  figures 
for  the  resulting  "tuned"  versions  of  the  benchmarks  are  shown  alongside  the  others  in  Figure  9.3. 
As  the  graphs  show,  this  improvement  of  the  precision  of  minor  yielding,  even  when  applied 
to  a  rather  small  portion  of  each  program's  TALT-R  code,  produced  noticeable  improvements  in 
yield  frequency  for  all  four  benchmarks.  I  take  this  as  evidence  that  any  serious  implementation 
of  yielding  with  Talt-R  should  use  the  precise  yield-on-jump  strategy  described  at  the  end  of 
Chapter  6  rather  than  a  simple  minor  yield  strategy  that  treats  all  blocks  in  a  program  the  same. 

Figure  9.4  shows  the  results  of  a  very  similar  experiment  to  measure  the  effect  of  the  Feeley 
function  "cost"  E  on  yielding  performance.  Benchmarks  were  compiled  using  function  costs  of  25, 
50,  80  and  100  instructions;  the  minor  yield  period  was  500.  The  results  are  mixed:  the  comb  and 
qsort  benchmarks,  which  use  arrays  and  iteration  more  than  they  use  allocation  or  recursion, 
seemed  to  perform  best  with  E  equal  to  80.  For  tempo,  probably  the  benchmark  most  representa¬ 
tive  of  typical  Popcorn  code,  100  was  better.  The  msort  benchmark  yielded  infrequently  regard¬ 
less  of  E.  These  results  are  less  impressive  than  the  effect  of  Y  or  of  L,  probably  because  there 
is  a  smaller  range  (zero  to  L)  over  which  E  can  be  varied;  the  fact  that  they  are  highly  program- 
dependent  suggests  that  a  compiler  capable  of  varying  the  value  of  E  between  functions  (which, 
remember,  requires  in  ferproced  u  ra  1  flow  analysis)  would  perform  better  than  one  that  treats  all 
functions  the  same.  Again,  using  precise  yield-on-jump  instead  of  a  Feeley  strategy  is  perhaps  the 
easiest  solution. 

1Namely,  the  functions  implementing  multiplication  and  division  in  terms  of  shifts  and  addition,  needed  to  work 
around  the  absence  of  the  native  instructions  for  these  operations  in  the  implementation  of  Talt. 
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Figure  9.4:  Effect  of  E  on  Yielding  Performance  (Y=100M,  L=500,  if =50) 


A  major  confounding  factor  in  these  experiments  is  that  they  were  run  under  Linux,  a  tradi¬ 
tional  preemptive  kernel;  the  preemption  of  the  processes  by  the  kernel  was  completely  unrelated 
to  their  executing  the  trivial  stub  implementation  of  the  yield  instruction.  For  the  measurements 
shown  in  Figure  9.5, 1  replaced  that  dummy  yield  with  one  that  yielded  the  CPU  using  the  Linux 
sched_yield  system  call.  Each  program  tested  was  run  in  parallel  with  a  CPU-hungry  "drone" 
process  that  also  yielded  in  a  manner  consistent  with  the  Talt-R  policy.2  The  figure  shows,  for 
three  choices  of  L  and  for  "tuned"  versions  produced  as  described  earlier,  the  elapsed  time  and 
the  percentage  of  that  time  allocated  to  the  benchmark  process  as  measured  by  the  Linux  time 
utility.  The  base  times  with  respect  to  which  the  elapsed  times  are  normalized  were  measured 
by  running  a  non-yielding  version  of  the  benchmark  in  parallel  with  a  non-yielding  version  of 
the  drone  —  that  is,  by  allowing  the  Linux  kernel  to  manage  the  competition  between  them  in  its 
usual  way.  The  qsort  benchmark  was  not  used  in  this  experiment  because  it  did  not  run  long 
enough  (less  than  half  a  second)  for  the  CPU  allocation  measurements  to  be  meaningful. 

As  expected,  benchmark  processes  compiled  with  larger  values  of  L,  which  produced  higher 
yield  rates  in  the  experiment  described  earlier,  took  longer  to  run  and  fared  less  well  in  competi¬ 
tion  with  the  drone  process.  Also  predictably,  the  msort  benchmark,  which  tended  to  yield  the 
least  often  in  other  experiments,  did  the  best  when  competing  for  the  CPU,  with  the  hand-tuned 
version  even  out-competing  the  drone;  comb,  which  generally  yielded  more  often,  performed  the 
worst.  What  is  also  worth  noting  is  that  the  impact  of  L  on  elapsed  time  was  much  greater  in  this 
experiment,  where  each  major  yield  forces  a  context  switch,  than  in  the  earlier  experiment  where 
the  yield  consisted  of  a  function  call  and  little  else. 

It  does  seem  discouraging  that  explicit  yielding  seems  to  produce  such  inequitable  allocation 
of  the  CPU.  Fortunately,  I  do  not  think  this  necessarily  represents  an  inherent  flaw  in  the  idea  of  al¬ 
lowing  processes  to  participate  in  scheduling.  Rather,  one  must  remember  that  the  Linux  kernel  is 
accustomed  to  being  in  complete  control  of  the  allocation  of  time  to  processes.  The  sched_yield 
call  defeats  this  careful  design:  it  unconditionally  yields  the  remainder  of  the  current  process's 
time  slice,  and  if  another  runnable  process  exists,  the  caller  is  guaranteed  to  be  suspended.  A 
kernel  that  expected  processes  to  yield  rather  than  be  preempted  would  have  to  recognize  that 
their  yielding  patterns  would  probably  be  erratic  and  differ  between  programs,  and  would  have 
to  take  this  into  account  when  scheduling  threads  for  execution.  The  yielding  operation  provided 
by  such  a  kernel  would  represent  an  opportunity  for  a  task  switch  but  would  not  force  one  if  the 
calling  process  deserved  more  time. 

2To  be  precise,  the  drone  was  a  small  C  program,  not  requiring  much  memory,  with  an  inner  loop  that  performed 
some  integer  arithmetic  and  yielded  every  m  iterations,  where  m  was  calculated  such  that  yields  occurred  on  the  order 
of  every  Yj 2  instructions. 
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Figure  9.5:  Effect  of  L  on  Competition  (Y=10M,  E'=100, 17=50) 


Another  flaw  in  this  attempt  to  simulate  a  cooperative  scheduling  environment  is  that  even 
though  both  the  drone  and  the  benchmark  process  in  each  trial  yielded  the  CPU  very  frequently, 
each  of  them  was  still  vulnerable  to  preemption  between  yields,  making  the  actual  impact  of 
context  switches  difficult  to  measure.  Creating  the  necessary  testing  environment  to  explore  the 
true  effects  of  static  enforcement  on  cooperative  scheduling  and  compare  the  results  to  preemptive 
scheduling  would  be  a  significant  undertaking. 


9.2  Discussion  and  Future  Directions 

In  this  section,  I  discuss  some  of  the  interesting  features  of  the  work  I  have  done  that  suggest 
potentially  worthwhile  topics  for  further  study.  The  areas  I  see  for  potential  future  work  in  this 
area  fall  into  two  categories:  those  that  amount  to  improving  the  particular  safety  policy  I  have 
studied  and  system  I  have  implemented,  and  those  that  further  explore  the  potential  capabilities 
of  the  techniques  I  have  used  here. 

9.2.1  Improvements  to  Implemented  System 

In  spite  of  the  groundwork  laid  by  earlier  researchers  at  CMU  and  elsewhere,  writing  a  type¬ 
preserving  compiler  from  scratch  is  still  far  from  easy,  even  for  a  very  simple  source  language. 
The  level  of  sophistication  I  was  able  to  achieve  in  the  time  allotted  to  me  for  this  project  was  quite 
low.  Many  improvements  are  needed  if  the  system  I  built  is  to  be  useful,  or  if  measurements  of  its 
performance  are  to  be  taken  seriously.  Most  of  these  improvements,  however,  are  not  particularly 
interesting  research  avenues. 

TALT  External  Syntax  Quite  apart  from  the  challenges  of  timing  certification,  I  found  the  task  of 
generating  well-typed  EXTALT  output  from  my  compiler  surprisingly  difficult.  The  main  culprit 
was  "coercions",  the  reified  subtyping  derivations  with  which  operands  must  often  be  annotated. 
Extalt's  coercions  tend  to  be  verbose,  repetitive  and  finicky,  especially  those  that  apply  to  the 
type  of  the  stack.  Nearly  every  control  transfer  instruction  output  by  my  compiler  must  coerce 
the  stack  so  that  its  type  matches  up  exactly  with  the  type  expected  by  the  target  code  block. 
Since  the  stack  type  is  a  long,  right-associated  product  of  types  most  of  which  are  not  changing, 
the  necessary  coercion  is  a  long,  right-associated  "product"  of  coercions  most  of  which  are  the 
identity  (and  often,  those  that  are  not  the  identity  are  the  forget  coercion  from  some  type  to 
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nonsense).  The  structure  of  the  coercion  must  mirror  the  structure  of  the  stack  type  exactly,  but 
failures  to  do  so  often  result  in  almost  incomprehensible  error  messages.  Coercions  that  must 
"reassociate"  a  large  product  of  types  ( e.g .  turn  (n  x  (72  x  (•  •  •  x  rn)))  into  (((ri  x  72)  x  ■  •  •)  x  rn)) 
are  also  extremely  annoying  to  generate. 

Another  locus  of  painful  coercion  is  the  compilation  of  Lilt's  case  construct  for  eliminating 
variant  types.  Since  Lilt's  variants  can  have  any  number  of  summands  but  Talt's  union  types 
are  binary,  the  coercions  witnessing  the  subtyping  premises  of  the  cmp  j  cc  instruction  typing  rule 
(not  to  mention  the  coercions  required  to  give  the  case  subject  a  binary  union  type  in  the  first 
place)  are  large  and  hard  to  get  right.  All  in  all,  I  estimate  that  more  than  half  of  the  time  it  took 
to  write  the  back  end  of  the  compiler  was  spent  debugging  coercions.  Something  must  be  done 
about  this. 

Some  of  the  problems  I  encountered  could  be  solved  by  adding  support  for  some  set  of  non¬ 
trivial  utility  coercions  to  the  front  end  of  the  Extalt  assembler.  These  could  take  care  of  things 
like  reassociating  large  products,  permuting  large  unions  or  intersections,  and  so  on.  Allowing 
EXTALT  programs  to  contain  "coercion  definitions"  akin  to  type  definitions  would  also  probably 
result  in  shorter,  more  readable  assembly  code.  But  it  is  not  clear  whether  such  techniques  would 
be  particularly  useful  for  the  problem  of  stack  coercions.  It  just  seems  unfair  that,  although  IA-32 
programmers  are  all  but  forced  to  treat  the  first  several  words  of  the  stack  as  if  they  were  registers, 
the  EXTALT  elaborator  can  automatically  insert  forget  coercions  for  actual  registers  but  not  for 
stack  slots.  I  predict  that  very  few  people  will  be  willing  to  program  in  EXTALT  as  long  as  this  is 
the  case. 

Responsiveness  Implementation  Improvements  Some  interesting  questions  are  raised  by  the 
effect  the  malloc  operation  was  observed  to  have  on  yield  rate  earlier  in  this  chapter.  In  partic¬ 
ular,  the  "smart  malloc"  simulated  in  the  experiments  is  fictitious;  the  question  of  how  such  a 
thing  might  be  implemented  reveals  a  larger  issue,  namely:  When  separately  compiled  program  mod- 
ides  and  libraries  must  cooperate  on  time  management ,  what  protocols  should  govern  the  interfaces  between 
them  to  maximize  both  flexibility  and  performance?  An  even  broader  issue  is,  how  far  do  the  implications 
of  a  particular  choice  of  yielding  strategy  reach?  Can  the  decision  to  use,  say,  Feeley  versus  call-return 
yielding,  or  the  decision  to  reserve  a  register  for  instruction  counting  be  viewed  as  an  implemen¬ 
tation  detail  of  a  module  and  hidden  behind  a  timing-agnostic  interface?  Or  does  the  abstraction 
boundary  introduce  an  "impedance  mismatch"  that  hurts  performance  unacceptably? 

A  related  question  is  whether  the  Feeley  placement  strategy  (whether  for  direct  yield  place¬ 
ment  or  minor  yield  placement)  discards  too  much  useful  information  by  assigning  all  functions 
the  same  fictitious  "cost".  Recall  that  the  number  E  is  an  upper  bound  on  three  quantities:  the 
"cost"  of  a  function  as  seen  by  callers,  the  number  of  instructions  between  function  entry  and  the 
first  yield,  and  the  number  of  instructions  between  the  last  yield  and  function  exit.  For  many 
functions,  the  latter  two  are  not  very  sensitive  to  the  initial  value  of  the  clock;  thus  rather  than 
choosing  a  single  cost  for  all  functions  ahead  of  time,  one  should  usually  be  able  to  compute  the 
cost  for  each  function  in  a  program  separately.  A  whole-program  analysis  would  be  needed  in 
order  to  take  advantage  of  this  more  precise  information  —  is  it  worth  it?  Only  further  study  can 
answer  that  question. 

When  I  began  my  investigation  of  responsiveness  certification,  it  was  expected  that  simple 
strategies  like  Feeley  yielding  and  dynamic  checking  would  be  insufficient  to  produce  acceptable 
performance.  My  performance  measurements  seem  to  suggest  that  this  is  not  the  case,  but  as  I 
suggested  in  my  discussion  of  those  results,  it  is  possible  that  an  aggressive  optimizing  compiler 
would  suffer  more  from  the  cost  of  dynamic  checking  than  my  naive  compiler.  If  this  is  the  case. 
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then  further  improvement  of  the  constraint  reasoning  in  Talt-R  is  called  for,  along  with  investi¬ 
gation  of  program  analyses  to  detect  opportunities  for  moving,  hoisting  or  eliminating  yields  or 
clock  checks.  The  static  computation  typical  of  LXres  [18]  is  a  good  place  to  start. 

The  yield  placement  strategies  I  have  studied  were  all  designed  with  the  goal  of  insulating 
programmers  from  the  issue  of  responsiveness.  This  is  an  important  thing  to  be  able  to  do,  be¬ 
cause  it  allows  experienced  programmers  who  are  accustomed  to  a  preemptive  setting  (where 
the  operating  system  insulates  them  from  responsiveness)  can  also  work  on  certification-based 
systems  without  having  to  learn  anything  new.  More  importantly,  it  means  that  code  written 
for  traditional  operating  systems  (in  type-safe  languages)  will  be  portable  to  certification-based 
ones  —  it  will  only  need  to  be  recompiled.  Finally,  the  ability  of  TALT-R  to  account  for  dynamic 
instruction-counting  schemes  (like  my  minor  yields)  means  that  the  difficulty  of  porting  compil¬ 
ers  to  a  certification-based  setting  is  also  small  and  need  not  increase  their  complexity  by  much. 
Realistically,  however,  the  benefits  of  certification  over  preemption  for  timing  policy  enforcement 
cannot  be  fully  realized  without  help  from  programmers.  Here,  again,  the  work  of  Crary  and 
Weirich  on  PopCron  and  TALres  may  provide  a  useful  starting  point. 

9.2.2  Applicability 

Real-Time  Programming  An  obvious  application  for  a  system  that  certifies  timing  properties 
is  in  real-time  programming.  As  discussed  in  Chapter  8,  Talt-R  could  in  principle  be  modified 
to  certify  policies  based  on  real  time  rather  than  on  instruction  counting.  The  suggestions  for 
doing  this  given  in  that  chapter,  however,  were  mostly  speculative.  Further  study  is  needed  to 
determine  whether  the  level  of  imprecision  resulting  from  conservative  estimates  of  instruction 
cost  or  the  cost  of  frequent  checks  of  the  real  clock  is  too  great  for  such  an  approach  to  be  practical. 
A  major  obstacle  to  improving  the  precision  of  static  reasoning  about  time  is,  of  course,  that  the 
cost  of  any  instruction  is  highly  dependent  on  the  dynamic  context  in  which  it  is  executed,  that  is, 
the  recent  history  of  the  process  that  determines  the  state  of  the  processor  pipelines,  caches  and 
virtual  memory.  It  is  conceivable  that  usable  levels  of  precision  can  only  be  achieved  by  reasoning 
about  the  execution  time  of  sequences  of  instructions  rather  than  individual  ones.  How  to  integrate 
such  reasoning  into  a  type  system  is,  to  my  knowledge,  an  open  problem. 


Software-Based  Process  Isolation  I  have  hinted  throughout  this  thesis  that  the  responsiveness 
policy  of  Talt-R  is  the  kind  of  timing  policy  that  an  operating  system  kernel  might  want  to  en¬ 
force,  but  I  have  not  put  this  claim  to  the  test.  It  will  not  be  clear  without  further  experimentation 
just  how  realistic  a  yielding  policy  like  Talt-r's  is  for  something  so  central  to  everyday  comput¬ 
ing  life  as  the  process  scheduler  of  an  operating  system.  More  expressive  policies  may  be  needed 
in  order  to  create  the  right  balance  between  safety  and  flexibility  that  preserves  system  reliability 
without  compromising  efficiency. 

This  thesis  has  been  primarily  concerned  with  the  type-theoretic  and  language-related  issues 
involved  in  certifying  responsiveness,  and  consequently  has  largely  ignored  the  concrete  seman¬ 
tics  of  the  yield  operation.  Where  concrete  intuition  has  been  required,  I  have  assumed  that 
"yielding"  necessarily  always  involves  a  context  switch  and  / or  interprocess  communication  — 
but  this  is  nowhere  reflected  in  the  operational  semantics  of  the  Talt-R  abstract  machine  or  the 
static  semantics  of  Talt-R.  From  the  point  of  view  of  the  formalism,  the  only  thing  that  matters 
about  the  yield  instruction  is  that  it  must,  as  a  matter  of  safety,  be  performed  periodically  with 
a  certain  frequency.  Any  such  operation  can  easily  take  the  place  of  yield  without  changing  the 
type  theory  —  but  the  performance  characteristics  of  that  operation  probably  ivill  affect  may  im- 
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plementation  decisions,  including  the  choice  of  "yield"  placement  strategy.  Fortunately,  Talt-R 
is  flexible  enough  to  support  several  reasonable  choices  and  extensible  enough  to  support  many 
more. 

Indeed,  from  an  efficiency  point  of  view,  forcing  processes  to  surrender  control  of  the  machine 
after  every  Y  instructions  (or  nanoseconds)  is  not  a  very  good  policy.  It  might  very  well  be  bet¬ 
ter,  if  the  hardware  supports  it,  to  require  merely  that  a  program  examine  some  flag  periodically, 
and  yield  control  whenever  it  finds  the  flag  set.  This  is  the  behavior  widely  known  as  "polling", 
although  I  have  used  that  word  in  this  thesis  for  something  slightly  different.  Although  their  typ¬ 
ing  properties  are  identical,  this  "poll"  operation  differs  from  a  true  yield  in  that  it  is  fast  in  the 
common  case:  usually,  no  event  will  have  occurred  to  set  the  flag,  so  no  context  switch  will  be 
needed  and  the  program  will  be  able  to  proceed.  In  fact,  it  was  for  this  kind  of  polling  that  Feeley 
designed  his  placement  strategy  [23].  The  positive  results  of  his  experiments  indicate  that  direct 
placement  of  polling  operations  using  the  Feeley  strategy  should  be  a  viable  approach  to  compi¬ 
lation  of  timing-ignorant  programs  in  this  setting.  (In  particular,  since  the  poll  operation  is  fast, 
one  would  expect  little  or  nothing  to  be  gained  from  dynamic  instruction  counting  using  a  clock 
register.)  Importantly,  this  is  no  different  from  Talt-R  with  yields  as  far  as  typing,  certification 
and  safety  are  concerned;  it  has  only  to  do  with  mapping  the  TALT-R  abstract  machine  to  concrete 
hardware  in  the  most  useful  way  possible  for  the  application  at  hand. 

The  difference  between  yielding  and  polling  (in  this  more  usual  sense)  is  a  manifestation  of 
a  deep  question  opened  by  the  availability  of  certification-based  enforcement  for  timing  policies, 
namely:  Who  should  decide  when  a  process  yields?  More  broadly,  what  should  the  respective  roles  of  the 
operating  system  and  a  user  process  be  in  managing  resources?  The  answer  is  not  obvious:  on  the  one 
hand,  only  the  operating  system  knows  enough  about  the  state  of  all  the  computer's  hardware  and 
software  to  know  when  a  yield  is  needed;  on  the  other,  only  the  user  process  knows  when  a  yield 
will  disrupt  it  the  least.  If  user  processes  are  allowed  to  yield  on  their  own  terms,  the  cost  of  saving 
useless  state  can  be  reduced  and  yields  can  be  arranged  not  to  fall  in  the  middle  of  cache-sensitive 
inner  loops;  but  unless  they  are  guaranteed  to  yield  soon  enough  after  a  yield  becomes  necessary, 
the  OS  will  have  no  choice  but  to  incur  the  cost  of  preemptive  management.  The  tradeoffs  between 
preemptive  and  non-preemptive  scheduling,  not  unknown  territory  to  operating  systems  experts, 
must  be  reexamined  in  light  of  the  fact  that  with  certification,  preemptive  systems  no  longer  have 
a  monopoly  on  stability,  reliability  or  fairness. 


9.3  Conclusion 


The  world  of  code  certification  is  at  a  turning  point.  We  have  more  or  less  mastered  the  type 
theory,  compilation  techniques  and  logics  needed  to  ensure  a  baseline  of  type  and  memory  safety 
in  small  to  medium-sized  chunks  of  untrusted  code.  The  focus  of  much  of  the  work  so  far  has 
been  on  mobile  code  applications,  including  applets,  mobile  agents,  and  large-scale  distributed 
computing;  these  applications  have  in  common  the  fact  that  they  run  on  top  of  modern  desktop 
or  server  operating  systems  on  which  the  host  environments  can  rely  for  many  aspects  of  safety, 
such  as  resource  usage,  that  the  certification  does  not  cover.  Our  attention  is  now  turning  toward 
increasingly  expressive  safety  policies  and  increasingly  powerful  certification  technologies  that 
can  take  over  more  and  more  of  these  duties  traditionally  assigned  to  operating  systems. 

There  are  a  number  of  rewards  to  be  found  just  around  this  bend.  Just  as  statically  type-safe 
programming  languages  admit  more  compiler  optimizations  and  allow  programs  to  run  with  less 
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safety-related  overhead  than  dynamically  typed  languages,  so  too  can  static  enforcement  of  safety 
policies  by  code  certification  improve  the  performance  of  software.  By  making  it  safe  to  lower  the 
hardware-based  barriers  between  processes,  between  components  of  a  process,  and  even  between 
the  kernel  and  user  processes,  static  enforcement  creates  the  potential  for  tighter  coupling  between 
components,  resulting  in  leaner  and  more  efficient  computer  systems.  This  potential,  if  realized, 
will  benefit  not  only  conventional  desktop  and  server  systems  but  also  the  growing  number  of 
smaller  devices,  such  as  mobile  phones,  handheld  computers  and  smart  cards,  for  which  power 
and  memory  are  still  scarce  resources. 

I  have  argued  in  this  thesis  that  timing  policies  play  critical  roles  in  the  security  and  reliability 
of  real  systems.  If  we  are  to  take  full  advantage  of  the  power  of  certified  code,  we  must  be  able 
to  enforce  such  policies  statically.  To  show  that  this  is  possible,  I  have  studied  the  certification 
of  an  extremely  common  timing  policy  that  I  call  responsiveness.  I  have  exhibited  some  strategies 
for  compiling  programs  in  a  general  intermediate  language  so  that  they  satisfy  this  policy,  and  I 
have  shown  that  a  surprisingly  simple  type  theory  suffices  to  prove  that  programs  compiled  using 
these  techniques  are  responsive.  My  theory,  called  Talt-R,  fits  well  into  an  existing  certification 
framework  and  is  easily  extended  to  a  wide  range  of  timing  and  resource  management  policies. 
I  hope  that  the  work  I  have  described  here  constitutes  some  first  steps  toward  static  enforcement 
strategies  for  these  policies  that  can  truly  compete  with  the  currently  popular  dynamic  approach. 


Appendix  A 

Complete  MiniTALT  Semantics 


Where  possible,  the  inference  rules  in  this  appendix  are  labeled  with  the  names  of  the  correspond¬ 
ing  rules  in  the  Twelf  formalization  of  Talt  type  safety;  these  labels  are  in  typewriter  font. 
Due  to  differences  between  MiniTALT  and  full  Talt,  and  between  the  media  of  typeset  inference 
rules  and  Twelf  code,  this  correspondence  is  rough.  The  Twelf  version  of  a  rule  may  differ  signifi¬ 
cantly  from  that  presented  here.  Rules  in  this  appendix  that  do  not  correspond  to  anything  in  the 
Twelf  codebase  are  given  ad  hoc  names  which  are  written  in  SMALL  CAPS. 


A.l  Static  Semantics 

Static  Term  Formation 


A.1.1 


A  b  c  :  K,  A  b  T 


{(a:K)  G  A) 
Ah  a:  I< 


(KOF.VAR) 


A  h  nsi  :  T i 


(kof_ns) 


A  h  Bi  :  T i 


(kof_b) 


A  h  B  :  N 


(kof_numlit) 


A  h  Ti  :  T  A  h  T2  :  T  ,  ,  A  b  n  :  TD  A  b  r2  :  TD  . 

(kof.prod)  - t— - - — zpb- -  (KOF_PROD_D) 


A  b  Ti  x  72  :  T 
A  b  n  :  Ti  A  b  r2  :  T  j 


A  b  Ti  x  r2  :  T(i  +  j) 
A  b  r : TD 


(KOF_PROD_l) 


A  b  r  :  T 


A  b  box(r)  :  TW 


A  b  T\  x  t2  :  TD 
(kof_box) 


A  b  r  :  T 


A  b  mbox(r)  :  TW 


(kof  _mbox) 


A  b  sptr(r)  :  TW 
A  b  r  :  Ti 


(kof.sptr)  (ko£_exp)  A  h  r  .  TD  Ah,  :  N  (KOF_Exp_D) 


Abr|B:T(bB) 
Abi:N 


(kof_exp_i) 


A  b  set=(a;)  :  TW 

A,  a:K  b  r  :  T 
A  b  VccK.t  :  T 
A,  a:K  b  r  :  T 


(kof_seteq) 


Abril  :  T 

A  b  t  :  T 
A  h  t  t  0  :  TO 

Aba;:  N 

A  b  set<(a;)  :  TW 


(kof_exp_z) 


A  b  t]x : TD 
A  b  r 


(kof.setlt) 


A  b  T  — >  0 :  TW 
A  b  x  :  N 


A  b  set>(x)  :  TW 


(kof.arrow) 

(kof  _setgt) 


(K'  G  {TD,T*})  A  be:  if 
A,  a:K  br  :  T  A  b  r[c/a]  :  K' 

(kof-forall)  - A  b  \/a:K.r  :  K' -  (KOF_FORALL_Dl) 


A  b  3a:K.r  :  T 

A, a:T  br  :  T 
A  b  /ia.T  :  T 


/\  cxiTC  I —  t  :  Ti 

(kof_exists)  -r-; — — — — - —  (KOF_EXISTS_l)  -r— : - —  (kof_void) 

v  ’  A  b  3a:K.r  :  Ti  v  '  Al - -  -1  •  v  ’ 


A  b  void  :  Ti 


{K  G  {TD,  Ti}) 

A,  a: T  b  r  :  T  A  b  r\fia.T/a]  :  I\ 

(kof.rec)  - — - - -  (KOF_REC_Dl) 

v  ;  A  b  na.T  :  I<  v  ’ 
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A  I-  n  :  T  A  b  72  :  T  ,  A  b  n  :  TD  A  b  r2  :  T  . 

(kof_meet)  - -r-1- - 3 — -  (KOF_MEET_Dl) 


A  b  t\  A  T2  :  T 
A  b  n  :  T  A  b  r2  :  TD 


A  b  n  A  r2  :  T  D 
A  h  n  :  T  A  b  r2  :  Ti 


(kof_meet_d2) 


A  h  t\  A  r2  :  T  i 
A  b  n  :  Ti  A  b  r2  :  Ti 
A  b  n  V  r2  :  Ti 


(kof_meet_i2) 


A  b  n  A  r2  :  TD 
A  b  T\  :  T  i  A  h  r2  :  T 
A  b  n  A  r2  :  Ti 
A  h  n  :  T  A  b  r2  :  T 


A  b  n  V  r2  :  T 


(KOF_MEETjl) 

(kof_join) 


(kof_join_i) 


A  h r : TD 
A  h  r  :  T 


(kof_d) 


:  Ti 


A  b  T : TD 


(KOF.I) 


A  h  Tsp  :  TD  Ah  Tr  :  TW  for  r  £  {ax, ....  bp} 
A  h  |eax:rax, . . . ,  ebp:rbp,  esp:rsp} 


(rtpok_) 


A.1.2 


ci  =  c2,  T  =  T'  Static  Term  Equivalence 


/  .  \  C2  =  Cl  /  ,  .  C\  —  C2  C2  =  C3  ,  ,  x 

-=-7;  (equiv.ref lex)  - — _  (equiv.symm)  -  -  (equiv.trans ) 

(a  not  free  in  c) 

(equivJoeta)  — — — - 773 —  (equiv_eta) 


(Aa:A"2.ci)c2  =  ci[c2/a]  (AccAT.ca)  =  c 

(equiv_lam) 


Cl  —  Cl  ,  ,  _  x  C1  —  clc2  —  c2 


AccA'i.ci  =  Aa:ATi.c2 

Xi  =  X2 


- — —  (equiv.app) 

Cl  c2  =  cx  c2 


set=(xi)  =  set=(x2) 

X\  =  x2 


(equiv_seteq) 


XI  =  x2 


set<(xi)  =  set<(x2) 


(equiv_setlt) 


,  .  . — -  (equiv_setgt)  - - 7  (equiv_prod)  — - - —  (equiv_exp) 

set>(xi)  =  set>(x2)  n  x  r2  =  r{  x  r2  rtx^r'tx' 


r  =  r 


r  ->•  0  =  r  ->  0 


(equiv_arrow) 


box(r)  =  box(r') 


—  (equiv_box) 


mbox(r)  =  mbox(r') 


—  (equiv_mbox) 


sptr(r)  =  sptr(r') 


—  (equiv_sptr) 


Voc.K.t  =  Vot.K.t' 


-  (equiv_f orall) 


t  =  T 


y-^  y  ^  y^  y  ^  y^  y  ^  T”2  y  ^ 

—  (equiv_exist s)  - - - -  (equiv_meet)  - - - -  (equiv.ioin) 

3a:K.r  =  3ot.K.t'  v  ’  n  A  r2  =  r{  A  7^  v  '  n  V  r2  =  r{  V  fr  v  ' 


t  =  t'  Tr  =  frfor  r  £  {ax, ....  bp} 


{eax:rax, . . . ,  ebp:rbp,  esp:rsp}  =  {eax:r{x, . . . ,  ebp:r^p,  esp:r'p} 


—  (equivr.) 


A.1.3  A  h  ri  <  r2/  A  h  T  <  T'  Subtyping 


A  b  r  <  r 


(reflex) 


n  =  t2 


(ref  lexeq) 


A  h  n  <  r{  A  h  t2  <  fr  A  h  r  <  r'  ,  ,  ,  -  ■  -  -  7  ,  i 

— - ; —  (prod_sub)  — — ■ - - - - -  (exp_sub)  — - - -  arrow.sub 

-/  —f  I1  /  A  I  A  1  /  A  I  ^  T^l  ,  n  '  ' 


A  h  ti  <  r3  A  h  t3  <  r2 

a  h  r  <  r 


(trans) 


A  h  Ti  x  r2  <  t{  x  r2 


AhT}x<r'}x 


Ahr-»o<r'-40 


A  h  r  < 


—  (box_sub) 


A  h  box(r)  <  box(r') 

(forgetm) 


A  h  mbox(r)  <  mbox(r') 

Ah  t<t'  A  h  t  :  TD  AhrhTD 


A  h  mbox(r)  <  box(r) 


A  h  sptr(r)  <  sptr(r') 


—  (mbox_sub) 

(sptr_sub) 
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A,  a:K  b  t  <  t' 


A  b  \/a:K.r  <  \/a:K.r 
A,  a:K  b  r  :  T  A  b  c  :  K 


-  (forall_sub) 


A,  a:K  b  t  <  t' 


A  b  3a:K.r  <  3a:K.r 


-  (exists_sub) 


A  b  Mcx:K.t  <  r[c/a ] 

(«  i  T)  ,  x  («  4-  t) 


.  A,  a:/b  b  t  :  T  A  b  c  :  K 

( f orall_elim)  - - - - — — -  (exists_intro) 


A  b  r  <  \/a:K.r 


(gen) 


A  b  3a:K.r  <  r 
A,a:Tbr  :T 


A  b  r[c/a]  <  3a:K.r 
(cogen)  A TJ  Tl  .  (NS_SUB) 


A  b  n  :  T 


A  b  r[/Lta.r/a]  <  fxa.T 

A  b  r  <  g  A  b  r  <  T2 
A  b  T  <  Tl  A  T2 

(meet_elim2 ) 


(rec.mtro) 


A  b  r  <  nsz 

A,  a:T  b  r  :  T 


(meet_intro) 
A  b  r2  :  T 


A  b  fia.T  <  T[fxa.T/ot\ 
A  b  r2  :  T 


A  b  T  :  T 
A  b  void  <  r 

(rec.elim) 

(meet.eliml) 


(void_sub) 


( join_introl) 


A  b  t  A  (ti  V  r2)  <  (r  A  n)  V  (r  A  r2) 
A  b  Ti  :  Ti  A  b  t2  :  Ti 


A  b  ti  :  T 
A  b  r2  <  n  V  t2 

(meet_dist_join) 

—  (meet_dist_prod) 


( join_intro2) 


A  b  (tl  x  t2)  A  (t(  x  T2)  <  (n  A  t{)  x  (t2  A  t^) 

(prod_dist_ joinl) 


A  b  r  x  (g  V  t2)  <  (t  x  Ti)  V  (t  x  t2) 

A  b  (ti  V  t2)  x  t  <  (g  x  t)  V  (t2  x  t) 

AbTiT  AbTiT 


A  b  t  x  void  <  void 


(prod_dist_voidl) 
(lassoc) 


(prod_dist_ join2) 

(prod_dist_void2) 


A  b  void  x  t  <  void 


A  b  Tl  X  (t2  X  T3)  <  (ti  X  T2)  X  T3 


A  b  (n  X  T2)  X  T3  <  Tl  X  (t2  X  T3) 


rassoc 


A  b  t  <  BO  x  T 


(luniti) 


A  b  t  x  BO  <  T 


(runite) 


A  b  BO  x  t  <  T 

A  b  t  :  T 
A  b  T  I  B  <  TJ 


(lunite) 


(explode) 


A  b  t  <  t  x  BO 
A  b  t  :  T 


(runiti) 
(implode) 


A  b  set=(i?)  <  BW 


(seteq_f  orget) 


A  b  set<(-B)  <  BW 


A  b  tb  <  t]B 

(setlt.forget) 


A  b  set>(i3)  <  BW 


A  b  set=(i?)  A  set>(-B)  <  void 


(setgt.f orget) 
(raa_gt) 


A  b  set=(i?)  A  set <(£?)  <  void 


A  b  BW  <  3a:N.set=(a) 
(Bi  >  B2) 


(focus) 


A  b  set<(-B)  A  set>(B)  <  void 
(■ Bi  <  B2) 


(raa.lt) 

(raa.ltgt) 


A  b  set<(-Bi)  <  set <{B2) 
(Si  <  B2) 


A  b  set>(Si)  <  set>(S2)  (subranTe-gfc)  A  h  set=(Si)  <  set<(S2) 


(subrange.lt) 

(subrange.eqlt) 


(Si  >  S2) 


A  b  set=(Si)  <  set >(S2) 


(subrange.eqgt) 


A  b  t  <  t'  A  b  Tr  <  Trfor  r  €  {ax, .  . .  ,  bp} 


A  b  {eax:Tax, . . . ,  ebp:Tbp,  esp:Tsp}  <  (eax:r(x, . . . ,  ebp:Tbp,  espin'  } 


—  (SUBRTYPE) 
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A.1.4  A;\l/;ri-o:r  Operand  Typing 


&;f;rh  B  :  set =(B) 


(OOF_SETEQ) 


Ai'tjThf  :  V(£) 


(oof_pointer) 


A;  4>;  T  b  esp  :  sptr(F(esp)) 


(oof.spco) 


A;  4>;  r  b  o  :  sptr(ri  x  t2  x  r3) 
A  h  n  :Tn  A  b  t2  :  Tm 
A  b  r(esp)  <  r  x  ri  x  T2  x  T3 
A;  4';  T  b  m‘[o  +  n]  :  T2 


(oof.zco) 


A;  4>;  Fb  01  :  box((n  x  r2  x  r3)  t  a;) 
A;  4';  Fb  02  :  set<(a:) 

A  b  ri  :  T?i 
A  b  T2  :  Tm 
A  b  t\  x  r2  x  T2  :  Tfc 

A;f;rb  m‘[oi  +  n  +  k  ■  o2]  :  r2 


(oof.imco) 


A  b  ri  :  Tn  A  b  T2  :  Tm 
A;  'F;  T  b  o  :  box(n  x  t2  x  t3) 


A;f;rbm‘[o  +  n]:T2 
A;f;F  b  o  :  BW  A  b  x  :  N 


(oofjnco) 


— - - - —  (oof.rco)  — - - — - — - - — - — - — —  (OOF_TRICHOTOMY) 

A;  4>;  r  b  r  :  F(r)  A;  4^  F  b  o  :  set<(x)  V  set=(a;)  V  set>(a:) 


A;'b;rbo:r/  A  b  t'  <  r 
A;  41;  F  b  o  :  r 


(oof.subsume) 


A.1.5  A;  41;  T  b  d  :  r  — >•  T'  Destination  Typing 


A  b  r  :  TVF  ,  lx.  ,  ^  Abr<sptr(r2)  A  b  F(esp)  <  n  x  r2 

— — - t 7  (update_rdest)  - - - - - - - - -  (update_spdest) 

A;$;rbr:rar{r:r}  1  F  ’  A;f;Fbesp:r^  r{esp:r2}  V  v  > 


A;  4>;  r  b  o  :  mbox(ri  x  t2  x  r3)  A  b  rj  :  T n  A  b  72  :  T?n 

A;  ;  F  b  ?n‘  [o  +  n]  :  T2  — >  T  P 

A  b  r(r)  <  sptr(ri  x  t2  x  t3) 

A  b  F(esp)  <  r  x  ri  x  T2  x  r3 
A  b  n  :  Trz  A  b  t2  :  Tm  A  b  t'2  :  Tin 

A;f;Fb  m‘[r  +  n]  :  t2  — >  r{esp:r  x  t±  x  t2  x  r3, r:sptr(ri  x  t'2  x  t3)} 


(update_mdest) 


(update_zdest) 


A  b  T\  :  Tn  A  b  t2  :  Tm  A  b  n  x  T2  x  72  :  T fc 
A;  4>;  r  b  01  :  mbox((ri  x  t2  x  t3)  "fa;)  A;  4>;  T  b  o2  :  set<(x) 

A;  4' ;  F  b  m‘  [o3  +  n  +  k  ■  o2]  :  t2  — +  F 


(update_imdest) 


A.1.6  A;  T  h  I  Instruction  Typing 


A;  ’I';  r  b  01  :  B4  A;4';Fbo2:B4 
A;  4';  T  b  d  :  B4  — *•  T'  Aj^r' b  I 

A;  'F;  r  b  add  d,  0\,  o2  I 


(ok_add) 


A;  4^;  r  b  o  :  sptr(ri  x  t2)  A  b  n  :  Ti 
A  b  t2  :  TD  A;  4^; r  b  d  :  sptr(r2)  — »  r'  A;  b  7 

A;  4^;  r  b  addsptr  d,o,i  I 


(ok_addsptr) 


E— i  ^ 
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(esp)  =  Ta)  A;  vp;  r  h  £  :  rr  — >  0 
;  F  b  o  :  F{esp  :  (Tr  — >  0)  x  rs}  — »  0 

A;  'P;  r  b  call  ol 


(ok.call) 


A;f;rh  I 

A;  5';  r  b  oi  :  BW  A;f;lbo2:BW 
A;>f;r  b  cmp  oi,  02  I 


(ok_cmp) 


A;tf;T 
A;^;r 
A 

A;^;r{r:ri} 

A;^;r{r:r2} 

A 
A 

A;^;T 
A;  vp;  r{r:r2,  ck:i}  b  I 


(r(r)  =  n  V  r2) 

b  Oi  :  BW 
b  o2  :  set=(a;) 
b  ri  V  r2  :  T  W 
b  01  : 
b  oi  :  t'2 


St[  A  T*£sat  <  void 
Tgat  <  void 
b  03  :  r{r:ri,  ckd}  — >  0 


A;  xF;  T  b  cmp  jcc  01,  o2,  n,  03  I 


(ok_cmp  j  cc) 


A  b  r(esp)  <  BO 
A;  'f;  r  b  halt 


(ok_halt) 


A;^;T  b  I 
A;f;rbo:r-40 
A;  \F;  r  b  jcc  n,  o;  / 


(ok_jcc) 


A;$;r  b  o  :  T  ->  0  ,  ,  .  , 

A  .j.  -r  1  ■ - T  °k-lmP) 

A;  V;  1  b  jmp  o  1 


A;  ’J;  F  b  o  :  r 

A;$;rbrf:T^r'  A;f;r'bl 

A;  ’F;  T  b  mov  d ,  o 


(okjnov) 


A;  xF;  r{r:nsw}  b  I  inits  r:mbox(ns") 
A;  \F;  r  b  malloc  n,  r  I 


(ok_malloc) 


A;$;Tbo2  :  set=(m)  A;  ^F;  T{r  :  nsw}  b  03  :  r 
Abr:Tn  A;  'f;  r{r:mbox(r  1  a;)}  b  I 
A;  *F;  T  b  mallocarr  01,  r,  n,  o2,  03  I 


(okjnallocarr) 


A  b  F(esp)  <  Ti  x  r2  A  b  n  :  Tn  A  b  r2  :  TD 
A; 'F;  r{esp:r2}  b  d  :  ri  — >  T'  As'FjT' b/ 

A;  *F;  T  b  pop  n,d  I 


(ok_pop) 


(r(esp)  =  ts) 

A;f;rbo:r  A;$;l{esp:T  x  rs}  b  I 
A;f;Fb  push  o  I 


(ok_push) 


A  b  r : TD 

A  b  T(sp)  <  (T{sp:r}  ->0)xr 
A;  *F;  T  b  ret 


(ok_ret) 


A; 'F;  Tie sp  :  nsn  x  T(esp)}  b  J 

-  .  T  v, , - - — -  ok.salloc) 

A;  *F;  T  b  salloc n  I 


A  b  T(esp)  <  n  x  t2  A  b  ri  :  T n 
A  b  r2  :  TD  A;  >F;  T{esp:r2}  b  / 

A;  *F;  T  b  sf  r  ee  n  I 


(ok_sf  ree) 


A;  'F;  T  b  oi  :  BW  A;f;Tbo2:BW 
A;$;rbd:  BW ->  T'  Aj’FjFbJ 


A;f;T'b  I  AbF<T' 
A;f;rb  I 


(ok.coerce) 


A;  *F;  T  b  sub  d,  01,  o2  I 


(ok_sub) 
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A.1.7 


$;Ah  I  :t  block,  A  b  P 


Block  and  Program  Typing 


(A,  a:K )  h  7  :  r  block 

f;Ah/:  Mct.K.t  block  (blOCKOK.FORALL) 


$;A;rb/ 

$;AhI:r->0  block 


(blockok_arrow) 


(dom('E')  =  (A, . .  -  An}) 
h  \1/  ('I'(^i)  =  {esp:B0}  — >  0) 

1':  •  h  /,;  :  block  for  1  <  i  <  n 

bf1  =  /1,...,4  =  4 


(progok) 


A.2  Operational  Semantics 

A.2.1 


H,  Vs,  R  b  o  ^  v  Operand  Resolution 


77,  Vs ,  R  h  v  v 


(resolve.im) 


H,VS,RL  r  R(r) 


(resolve.rco) 


H,Ve,R\-o~+l  (|Vl|  =  n) 

(77(7)  =  V1@V2@V3)  {\V2\  =  to) 

(resolve.spco)  -  — — - — - - - — -  (resolve_mco) 


H,  Vs,  1?  b  esp  sptr(|!4|)  77,  14, 1?  b  to‘[o  +  n]  -w  V2 

H,  14, J?bo^  sptr(s)  (Vs  =  V'@V,  \V\  =  s,  V  =  Vi@V2@t,3,  |Vl|  =  n,  | V2 1  =  in) 


H,VS,R\~  to‘[o  +  n]  -w  14 

77,VS,77I- 01  ~*7  77,  V, ,  7?  f-  o2  -w  77 
(77(7)  =  Vi@V2<a>V3,  |Vl|  =n  +  n'B1  |V2|  =  to) 

77, 14,  R  V  m ‘ [01  +  n  +  n'  ■  o2 ]  ~^>  V2 


(resolve.zco) 


(resolve_imco) 


A.2.2 


H,  Vs,  R  b  d(w)  H' ,  V',  R'  Destination  Propagation 


77,  Vg,  R  h  r{v)  77, 14, 7?{r  v} 


(propagate_rdest) 


(K  =  |V/|  =  «) 


77, 14, 7?  b  esp(sptr(?i))  77, 17',  7? 


- —  (propagate_spdest) 


77,  Vs,  R  \-o~*£  (77(7)  =  Vi@V2@V3,  |Vi|=n,  |V2|  =  IV) 
77,  Vs,  7?  h  lV‘[o  +  n](u)  77{7  Vi@u@V3},  Vs,  R 


(propagate_mdest) 


77,  Vs,  R  h  o  sptr(s) 

(Vs  =  V0@Vi@V2@V3,  |Vi@V2@V3|  =  s,  |Vi|  =  n,  |V2|  =  IV) 
77,  Vs,  7?  h  lV‘[o  +  n](v)  77,  V0@Vi@t;@V3, 7? 


(propagate_zdest) 


77,  Vs,  7?  b  01  7  H,VS,  R\~  o2  B 

(77(7)  =  Vi@V2@V3,  | Vl I  =  to  +  to'E,  |V2|  =  IV) 

77,  Vs,  R  b  1V‘[oi  +  to  +  to'  •  o2](w)  77 {7  1— >  Vl@i>@V3},  14,  R 


(propagate_imdest) 


Appendix  B 

Complete  MiniTALT-R  Typing  Rules 


To  save  space,  this  appendix  focuses  on  the  differences  between  MiniTALT  and  MiniTALT-R.  Un¬ 
less  otherwise  noted,  all  of  the  rules  listed  in  Appendix  A  are  retained  in  MiniTALT-R.  Any  rule 
in  this  appendix  with  the  same  label  as  a  rule  in  Appendix  A  supercedes  the  MiniTALT  rule. 
MiniTALT-R  rules  for  which  no  MiniTALT  analogues  exist  are  given  ad  hoc  names  and  labeled  in 
SMALL  CAPS,  even  if  there  is  an  analogue  in  the  LF  implementation  of  Talt. 


B.l 


Ah  c:  K 


Static  Term  Formation 


( n  >  0) 
Ahn:  N 


,  .  Ahti:N  Aht2:N  , 

(kof-numlit)  - - - - - A -  (KOF_NUMADD) 

A  r  ci  +  t2  :  N 

(K  e  {t,t*,td}) 

A  h  99  :  P  A  h  r  :  A' 

Ah  ^r:K  (KOF-GUARD) 


B.2 


Ah  cp  prop 


Constraint  Formula  Formation 


A  h  h  :  N  Ah  t2  :  N 
A  h  t\  <  t2  prop 


(propok.leq) 


A  h  tx  :  N  A  h  t2  :  N 
A  h  t\  =  t2  :  prop 


(PROPOK.EQ) 


B.3  I  ci  =  C2,  (pi  =  Static  Term  and  Formula  Equivalence 


99  =  99'  T  =  t' 

99  =h  r  =  99'  =h  r 


7  (equiv.guard) 


1 2  —  t2 


tl  +  t2  =  T  t'2 


f  (equiv_numadd) 


tl  =  t\  t2  =  t'2 


(h  <  t2)  =  (t[  <  t'2 ) 


—  (fequiv.leq) 


tl  =  t\  t2  =  t'2 


(h  =  h)  =  (h  =  h2) 


—  (FEQUIV_EQ) 


t  =  t'  T  =  Tf  Tr  =  h/0f  r  ^  {axi  •  ■  j  hp} 


{eax:rax, . . . ,  ebp:rbp,  esp:rsp,  ck:t}  =  {eax:r's, . . . ,  ebp:rbp,  esp:r'p,  ck :t'} 


—  (equivr.) 
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B.4  A  h  tp  true  Constraint  Truth 


((p  true)  €  A) 


(tr_hyp) 


A  h  t  :  N 


A  hip  true  '  A  h  t  =  t  true 

A  h  t\  =  t3  true  A  h  f3  =  f2  true 


(tr_eq_refl) 


A  h  f2  =  true 


A  h  t\  =  t2  true 


Ah  m  +  n  =  m  +  n  true 
A  h  ti  :  N  A  h  t2  :  N 


(tr_eq_trans) 
(tr_add_lit 


A  h  ti  =  i2  true 

A  h  t\  =  ti  true  A  h  t2  =  t2  true 
A  h  ti  + 12  =  t[  +  t'2  true 
A  h  t  :  N 


(tr_eq_symm) 


(tr_add_compat) 


A  h  ti  +  f2  =  t2  +  t\  true 

A  h  ti  =  <2  true 
A  h  t\  <  <2  true 

(in  <  n) 


(tr_add_commute) 

(tr_leq_refl) 
(tr_leq_lit) 


A  h  0  + 1  =  t  true 
A  h  ti  :  N  (for  i  =  1, 2, 3) 


A  h  (t\  +  t2)  +  ^3  —  +  (t2  +  t3)  true 

A  h  fq  <  t3  true  A  h  t3  <  f2  true 


(TR_ADD_IDENT) 

(tr_add_assoc) 


a  h  t\  <  t2  true 

A  h  ti  <  t[  true  A  h  <2  <  t2  true 


Ah  m  <  n  true 

Aht  +  fi  <t  +  t2  true 


A  h  t\  +  t2  <  +  t'2  true 

A  h  t  :  N 


(tr_leq_trans) 


(tr_add_mono) 


(tr_add_inj) 


A  h  ti  <  t2  true  A  h  0  <  t  true 

A  h  ti  <  <2  true  A  h  t2  <  ti  true 


(tr_leq_z) 


A  h  ti  =  t2  true 


(tr_leq_antisymm) 


B.4.1  Rule  For  Rational  Extension 


AKf+ - h  t  <  u+ - b  u  true  (n  G  {1,  2, . . .}) 

AKt<  u  true 


(tr_add_repeat) 


B.5 


A  b  n  <  r2,  A  h  r  <  r 


Subtyping 


Ahip  prop 
Ahr^^r) 


(gen_guard) 


A  h  t  :  T  A  h  ip  true 


A  h  (p  =>  t)  <  ■ 


(guard.elim) 


A,  p  true  h  r  <  t' 


A  h  (p  r)  <  (p  r') 


—  (guard_sub) 


A  h  t±  =  t2  true 
AhS(ti)  <S(t2) 


(seteq_sub) 


A  h  t  :  N 


A  h  S(t)  <  Bhh 


(seteq_f  or  get) 


Ah  t'  <  t  true  A  h  r  <  t'  Ahrr<  r'rfor  r  G  (ax, . . . ,  bp} 


A  h  {eax:rax, . . . ,  ebp:rbp,  esp:rsp,  ck:f}  <  {eax:r'x, . . . ,  ebp-.r^,  espir^,  ck:t'} 


—  (subrtype) 


B.5.1  Unofficial  Rules 


AhK  u  true 
A  h  S(u)  <  3a  :  N.5(<  +  a) 


(FOCUS-LEQ) 


B.6. 


A;$;rhO:r 


OPERAND  TYPING 
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B.6 


A;  T  h  o  :  r 


Operand  Typing 


(A,  ip  true):  f  hv:r 

- - -  (OOF_GUARD_INTRO) 

A\  v  :  ip  t  ’ 


B.7 


A;  T  h  / 


Instruction  Typing 


(r(ck)  =  i  +  t) 

A;  rp;  r  h  oi  :  B4  A; 'I';  F  b  o2  :  B4 
A;f;r  h  (i  :  B4  -»  T'  A;  W;  r'{ck:t}  b  I 

A;  rp;  r  b  add  d,  oi,  o2;  I 


(ok_add) 


A;  rp;  r  b  o  :  sptr(Yi  x  r2)  A  b  t\  :  Ti  A  b  r2  :  TD 
A;  rf;  T  b  d  :  sptr(r2)  — >  T'  A;  rf;  r'{ck:i}  b  /  (F(ck)  =  1  +  t) 
A;  rf;  r  b  addsptr  d,o,i  I 


(ok_addsptr) 


A';*;r„tb/  A'bFret  A;  T'  b  o  :  T'  — >  0 
(r(ck)  =  1  +t)  (A'  =  A,  oti'.Ki, . . . ,  an:Kn) 

(F'  =  r{sp:(Vai:/\i . . . \/an:Kn.Tret  ->  0)  x  T(sp),  ck:t}) 
A; F  b  call  o;  I 

A;  rp;  r{ck:<}  b  /  (T(ck)  =  1  +  t) 

A;  VP;  r  b  oi  :  int  A;  ’F;  F  b  o2  :  int 
A;f;T  b  cmp  o\,  o2  / 

(F(ck)  =  2  +  t)  (F(r)  =  ri  V  r2) 


(ok.call !!!!!) 


(ok_cmp) 


A;  VF;  r 
A;  VF;  r 
A 

A;^;r{r:r!} 

A;^;r{r:r2} 

A 

A 

A;  rf;  r 


b  oi  :  int 
b  02  ■  set-(x) 
b  T\  V  t2  :  T  W 
b  Oi  :  t[ 
b  Ox  :  T2 

h  T1  A  T unsat  <  void 

i  r2  A  1~sat  <  void 
b  03  :  r{r:ri,  ckd}  — > 


A;  rf;  r{r:r2,  ck:t}  b  / 


A;  'F;  F  b  cmp  jcc  o\,  o2,  ft,  03  I 


(ok_cmp jcc) 


A;$;rbo:  (r{ckd})  ->  0 
A;  >F;  r{ck:t}  b  /  (F(ck)  =  l  +  t) 

A;f;T  b  jcc  ft,  o;  I 


(ok_jcc) 


(r(ck)  =  1  +  t) 
A;  rf;  r  b  o  :  (r{ck:i}) 
A;  *F;  F  b  jmp  o;  I 


(ok_ jmp) 


(r(ck)  =  1  +  t)  A;  >F;  F  b  o  :  r 
A;f;Fbd:T^r'  A;  >F;  F'{ck:<}  b  I 

A;  rf;  r  b  mov  d,o  I 


(okjnov) 


(r(ck)  =  1  + 1) 

A;  rf;  r{r:nsw,  ck:t}  b  I  inits  r:mbox(nsn) 
A;  b  malloc  n,  r  I 


(ok_malloc) 
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(r(ck)  =  1  + 1) 

A;  'I';  r  b  02  :  set=(x)  A;  ;  T{r  :  nsw}  b  03  :  r 

Ahr:  T  n  A; *J/;  r{r:mbox(r  |  a:),  ck  :!}  b  I 

- 7 —  - - -  (okjnallocarr) 

A;W;I  b  mallocarr  01,  r,  n,  02,  03  1 


(F(ck)  =  1  +  1) 

A  b  r(esp)  <  tj  x  T2  A  h  T!  :  Tn  A  b  r2  :  TD 
A;  'I';  r{esp:r2}  b  d  :  — >  U  A;  SE' ;  r'{ck:t}  b  / 

A;f;F  b  pop  n,d  I 


(ok_pop) 


(F(ck)  =  1  +  1)  A;  'I';  T  b  o  :  r 
A  b  r  :  TD  A;  T{esp:r  x  F(esp),  ck:l  b  / 
- A;tt;rhpu3hoJ -  (ok-push) 


A  b  r  :  TD  (T(ck)  =  1+1) 

A  b  T(esp)  <  (T{esp:r,  ck:l}  — >  0)  x  r 
A;  'I';  T  b  ret 


(ok_ret) 


(T(ck)  =  1  +  1) 

As'E'jrjespinsn  x  T(esp),  ck:l}  b  / 
A;  To  T  b  salloc  n  I 


(ok_salloc) 


(T(ck)  =  1  +  1) 

A  b  T(esp)  <  Ti  x  r2  A  b  n  :  Tn 
A  b  r2  :  TD  A; 'F;  T{esp:r2,  ck:l}  b  / 

A;f;Tb  sf r ee  n  I 


(ok.sfree) 


(r(ck)  =  1  +  1) 

A;  'T;  T  b  o\  :  int  A;  EH;  T  b  02  :  int 
A;  EH;  T  b  d  :  int  — >  T'  A;  'H;  r'{ck:l}  b  I 

A;  'H;  T  b  sub  d,  01,02  I 


(ok_sub) 


A;^;r{ck:F}  b  / 

A;f;r  b  yield;  / 


(OK_YIELD) 


A;^;r{rd:Bl+,ck:l}  b  I 

A;  T  b  03  :  Va:l\l.(w  =  v  +  a)  +■  r{r<p5(a),  ck:l}  -a  0 
A;  ^H;  T  b  01  :  S(u)  A;  'H;  T  b  o2  :  S(v)  (T(ck)  =  2  +  1) 
A;  vH;  T  b  sub  jae  rj,,  01,02,  03  / 


(OK.SUBJAE) 


B.7.1  Unofficial  Rules 


A;f;r{rd:Bff,  ck:l}  b/ 

A;  T';  T  b  03  :  r{rd:5(u  +  v),  ck:l}  — >  0 
A;^;rboi  :S(u)  A;  T  b  o2  :  5(u)  (T(ck)=2  +  1) 

A;  T1;  T  b  add  jno  r<i,  01, 02,  03  I 


(OK_ADDJNO) 


A;  'H;  T  b  01  :  S(u  +  v)  A;  'F;  T  b  o2  :  S(u) 
A;^;r'{ck:l}  b  I  A;  >F;  T  b  d  :  S(v)  -»  U  (T(ck) 

A;  ;  T  b  sub  d,  01,02  I 


l  +  l) 


(ok_sub_seteq) 


B.8 


A;  r  h  /  inits  r:mbox(r) 


Object  Initialization 


A  b  T  <  T1  X  72  X  T3 

A  b  n  :  Tn  A  b  T2  :  T to  A  b  Tj  :  T?n  (T(ck)  =  1  +  1) 
A;  T  b  o  :  t'2  A;  ^H;  T{ck:l}  b  I  inits  r:mbox(ri  x  t'2  x  73) 

A  b  mov  m‘[r  +  n],o  I  inits  rmbox(r) 


(ok_init_mov) 


B.9. 


$;Ah/:r  BLOCK 


BLOCK  TYPING 
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A  b  T  <  T\  X  72  X  T3 

A  b  n  :  Tn  A  b  t2  :  Tto  (r(ck)  =  1  + 1) 

A  b  r(esp)  <  x  r'  A  hr^  :  Tin  A  b  t'  :  TD 
A;  r{esp:T',  ck:t}  b  /  inits  r:mbox(ri  x  t'2  x  T3) 

A  b  pop  to,  to‘[o  +  n]  I  inits  r:mbox(r) 


(ok_init_pop) 


A;  r{?’:mbox(r)l  b  I 

— - ; - —  (ok_init_done) 

A;  F  b  I  inits  r:mbox(r) 


B.9 


$;Ah/  :  r  block 


Block  Typing 


vEf;  (A,  if  true)  b  I :  r  block 
Vi/;  A  b  /  :  ip  =k  r  block 


( BLOCKOK_GUARD) 
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Appendix  C 

Rational  Semantic  Proofs 


This  appendix  continues  the  discussion  of  Chapter  4  to  describe  a  somewhat  more  powerful  vari¬ 
ant  of  the  Talt-R  constraint  logic.  The  result  that  there  exists  a  proof  within  this  peculiar  logic  of 
a  given  formula  under  given  hypotheses  if  and  only  if  there  exists  a  feasible  solution  to  a  certain 
integer  program  (in  the  proof  of  Theorem  4.1)  raises  some  questions.  Among  the  most  obvious 
is,  what  happens  if  this  linear  program  is  feasible  over  the  rationals  but  not  the  integers?  It  is 
not  hard  to  convince  oneself  that  when  the  program  is  feasible  over  Q,  the  constraint  problem 
from  which  it  was  derived  is  valid  over  Q  (and  hence  also  Z),  for  a  feasible  solution  provides  a 
set  of  multipliers  by  which  any  high  school  algebra  student  may  scale  the  hypotheses,  add  them 
together  and  conclude  the  truth  of  the  goal  formula  —  and  to  the  high  school  student,  it  makes 
little  difference  whether  these  multipliers  are  integers  or  not. 

By  allowing  the  possibility  of  noninteger  coefficients  in  a  linear  combination,  in  fact,  we  all  but 
exhaust  the  high  school  algebra  student's  repertoire  of  techniques  for  deriving  such  inequalities. 
Indeed,  it  turns  out  that  if  the  restriction  to  integers  in  the  characteristic  linear  program  is  dropped, 
then  an  interesting  completeness  property  can  be  shown  to  hold.  It  further  turns  out  that  only 
one  more  rule  must  be  added  to  the  Talt-R  constraint  logic  to  get  a  proof  system  of  equivalent 
power  —  that  is,  capable  of  deriving  any  constraint  judgment  that  denotes  a  valid  entailment 
over  the  rational  numbers  (with  one  technical  restriction).  Those  two  results  are  the  subject  of  this 
appendix. 

Other  than  a  few  notational  definitions,  surprisingly  little  work  is  needed  to  prove  that  there 
exists  a  (rational)  feasible  solution  to  the  linear  program  in  the  proof  of  Theorem  4.1  (hereafter 
called  the  characteristic  linear  program  of  the  judgment  Ah  t  <  u  true)  whenever,  and  only  when, 
the  constraints  A  imply  t  <  u  over  the  nonnegative  rationals.  In  fact,  this  is  a  simple  corollary 
of  so-called  linear  programming  duality,  a  concept  well  understood  by  numerical  optimization 
experts  if  not  by  programming  language  designers.  For  an  overview  of  the  topic  I  refer  the  reader 
to  the  popular  algorithms  textbook  from  which  the  following  definitions,  notations,  and  statement 
of  the  key  theorem  are  adapted  [10]. 


C.l  Linear  Programming  Duality 

A  linear  program  is  an  optimization  problem  in  which  the  goal  is  to  maximize  (or  minimize)  the 
value  of  one  linear  polynomial  (the  objective  function)  subject  to  a  set  of  constraints,  each  of  which 
is  a  linear  equation  or  inequality.  A  linear  program  in  standard  form  can  be  written  like  so: 
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Maximize:  c\X\  +  •  •  •  +  cnxn 

subject  to:  ai\X\  +  •  •  •  +  a\nxn  <  b\ 

0"rn  1  'J'm  T  SdmnXn  CAm 

and  Xi  >  0  for  1  <  i  <  n. 

The  nonnegativity  constraints  for  all  the  variables  are  usually  left  implicit.  Every  linear  program 
has  an  equivalent  standard  form. 

The  dual  of  the  linear  program  above  is  the  following  one  (not  in  standard  form): 

Minimize:  bxyi  H - b  bmym 

subject  to:  auyi  H - b  amiym  >  ci 

ainVi  H - b  Q'mnym  ^  Cn 

and  yl  >  0  for  1  <  i  <  rn. 

In  other  words,  to  obtain  the  dual  linear  program  from  the  original  (or  primal),  we: 

1.  transpose  the  matrix  of  coefficients  in  the  constraints,  so  that  there  are  now  rn  variables  and 
n  constraints; 

2.  interchange  the  roles  of  the  constant  terms  (the  b's  in  the  primal)  and  the  objective  function 
coefficients  (the  c's  in  the  primal); 

3.  replace  maximization  with  minimization;  and 

4.  reverse  the  sense  of  each  inequality  constraint  (except  for  the  nonnegativity  constraints). 

(The  last  of  these  is  the  reason  the  dual  program  as  given  above  is  not  in  standard  form.)  The 
linear  programming  duality  theorem  states  that  the  primal  and  dual  linear  programs  have  the 
same  optimal  objective  value. 

Theorem  C.l  (Linear-programming  duality)  For  primal  and  dual  linear  programs  as  given  above,  if 
x  =  (xi, ,  xn)  is  an  optimal  solution  to  the  primal  linear  program  and  y  =  (yi, . . . ,  ym)  is  an  optimal 
solution  to  the  dual  linear  program,  then 

n  m 

J2 CiXi  =  J2  biyp 

rl  3= l 


Proof:  See  [10],  Chap.  29,  Thm.  29.10. 


C.2.  CHARACTERISTIC  LINEAR  PROGRAMS 
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C.2  Characteristic  Linear  Programs 


An  assignment  over  a  set  A  is  a  function  q  :  Var  — >  A.  If  P  is  a  linear  polynomial  (in  the  sense  of 
Definition  4.2),  then  the  application  of  P  to  an  assignment  q  over  Q  is  the  rational  number  given 
by 


P@q  =  P(  1)  +  ^  P(x)q(x). 


rrGVar 


(If  q  is  an  assignment  over  Z,  then  P@rj  is  an  integer.)  An  assignment  q  over  the  nonnegative 
rationals  (or  over  the  nonnegative  integers)  is  a  rational  model  (or  an  integer  model,  respectively) 
of  the  polynomial  constraint  (P  <  0)  if  P@q  <  0.  A  model  of  a  set  of  polynomial  constraints 
is  an  assignment  that  is  a  model  of  every  constraint  in  the  set.  A  set  of  polynomial  constraints  is 
consistent  over  Q  or  Z  if  it  has  at  least  one  rational  or  integer  model,  respectively,  and  inconsistent 
otherwise.  A  constraint  judgment  A  b  t  <  u  true  is  valid  over  Q  or  over  Z  if  every  rational  or 
integer  model,  respectively,  of  [A]  is  a  model  of  [t  <  it] . 


Theorem  C.2  If  A  is  finite  and  [A]  is  consistent  over  Q,  then  Ah  t  <  u  true  is  valid  over  Q  if  and  only 
if  the  linear  program  in  the  proof  of  Theorem  4.1  is  feasible  over  Q. 


Proof:  Suppose,  in  all  that  follows,  that  [A]  is  consistent  over  Q.  Let  J  stand  for  the  judgment 
Ah  t  <  u  true.  The  structure  of  this  proof  is  as  follows:  we  construct  a  feasible  linear  program 
L  whose  maximum  objective  value  is  nonpositive  if  and  only  if  J  is  valid  over  Q;  by  the  duality 
theorem,  it  follows  that  P's  dual  program  L*  can  take  on  a  negative  objective  value  iff  J  is  valid; 
finally  we  show  that  L*  can  take  on  a  negative  objective  value  if  and  only  if  the  characteristic  linear 
program  of  Theorem  4.1  is  feasible. 

First,  we  construct  a  linear  program  corresponding  directly  to  the  validity  of  J .  Suppose  A  = 
{Pi, . . . ,  Hn}.  Any  assignment  //  that  is  a  model  of  [A]  must  by  definition  satisfy  the  following: 

Pi(I)  +  Hi(a{)q(ai)  H - b  <  0 

Hn( T)  +  Hn{ai)q{ai)  H - b  Hn(am)q(am)  <  0 

where  a\ , . . . ,  am  are  all  of  the  constraint  term  variables  appearing  in  J .  J  is  valid  over  the  ra¬ 
tionals  if  and  only  if  for  every  such  q,  \t  <  u}@q  <  0,  that  is,  if  the  objective  value  of  the  linear 
program 

Maximize:  P(l)  +  P{a\)x\  +  •  •  •  +  P{arn)xrn 

subject  to:  H\  (1)  +  Hi(a\)xi  H - b  Hi{am)xm  <  0 


Hn  ( 1 )  +  Hn{a\)xi  +  •  •  •  +  Hn(a  m)%m  <o 

and  Xi  >  0  for  1  <  i  <  m 

is  always  less  than  or  equal  to  zero,  where  P  =  [t  <  «].  Because  [AJ  is  consistent,  this  linear 
program  is  feasible,  so  J  is  valid  if  and  only  if  the  optimal  objective  value  of  the  program  is 
nonpositive. 

This  program  is  not  quite  in  standard  form  as  defined  above,  because  of  the  constant  terms  on 
the  left-hand  sides  of  the  constraints.  One  way  to  obtain  an  equivalent  program  in  standard  form 
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is  to  add  a  new  variable,  xj,  to  play  the  role  of  unity,  and  constrain  it  to  be  equal  to  1.  In  this  way 
we  obtain  the  linear  program  L: 

Maximize:  P(l)xj  +  P{a\)x\  +  •  •  •  +  P(am)xm 

subject  to:  Hi(l)xj  +  Hi{ai)x\  H - h  Hi(am)xm  <  0 

Hn(. l)^x  d-  H-n{p i)*^i  *  *  *  T Hn(  —  0 

Xj  <1 

—Xj  <  —  1 

and  all  variables  >  0. 

This  program  has  a  nonpositive  optimum  if  and  only  if  the  judgment  J  is  valid  over  Q. 

The  dual  of  the  program  above  is: 

Minimize:  yj  —  y_j 

subject  to:  iTy  (I)r/i  4 - h  Hn(l)yn  +  yj  -  y_j  >  P( T) 

^i(ai)?/iH - b  Hn{ai)yn  >  P(ai) 

Hi{0"in)yTL  ■  ■  ■  T  Hn(am)yn  A  P{Am) 

and  all  variables  >  0. 

The  constraints  of  this  linear  program  are  almost  exactly  the  same  as  those  of  the  characteristic 
linear  program  of  Theorem  4.1.  The  only  difference  is  the  presence  of  the  two  extra  terms  in 
the  first  line  of  this  dual  linear  program,  which  are  missing  in  the  characteristic  linear  program. 
Therefore,  we  can  say  that  the  characteristic  linear  program  is  feasible  if  and  only  if  there  is  a 
feasible  solution  to  the  present  program  with  objective  value  zero. 

Now,  if  the  judgment  J  is  valid  over  the  rationals,  then  the  objective  value  of  the  primal  linear 
program  is  bounded  above  by  zero,  so  its  maximum  objective  value  is  nonpositive.  This  means 
that  the  minimum  objective  value  of  the  dual  linear  program  is  also  nonpositive.  But  because 
the  only  occurrence  of  the  variables  in  the  objective  function  in  the  constraints  is  in  the  first  line, 
where  the  objective  function  appears  positively  on  the  left  of  a  >,  if  there  is  a  feasible  solution 
with  negative  objective  value,  then  there  is  one  with  objective  value  zero.  This  latter  solution 
corresponds  to  a  solution  to  the  characteristic  linear  program. 

Conversely,  if  the  characteristic  system  of  inequalities  is  satisfiable,  then  by  setting  yj  =  y_j  = 
0  we  obtain  a  feasible  solution  to  the  dual  program  with  objective  value  zero.  There  are  now 
two  cases:  either  the  dual  program  is  unbounded,  or  it  has  a  minimum  objective  value  that  is 
nonpositive.  In  the  latter  case,  the  maximum  objective  value  of  the  primal  program  is  nonpositive 
and  so  J  is  valid  over  the  rationals.  As  for  the  unbounded  case,  it  is  a  fact  that  for  any  pair  of  dual 
linear  programs,  the  objective  value  of  any  feasible  solution  to  the  maximization  problem  is  less 
than  or  equal  to  that  of  any  feasible  solution  to  the  minimization  problem;  it  follows  that  if  either 
one  is  unbounded,  then  other  is  infeasible.  Therefore,  if  the  dual  program  is  unbounded,  then  [A] 
is  inconsistent,  contradicting  our  assumption  to  the  contrary. 

We  conclude  that  J  is  valid  over  Q  if  and  only  if  the  characteristic  system  of  inequalities  has  a 
rational  solution. 

End  of  Proof. 
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C.3  Rational  Semantic  Proofs 

Now  that  I  have  shown  the  connection  between  the  validity  of  a  truth  judgment  and  the  sat¬ 
isfiability  of  is  characteristic  inequalities,  we  can  work  backwards  through  the  development  of 
Section  4.2  to  discover  an  extension  of  the  Talt-R  constraint  logic  capable  of  deriving  all  those 
judgments  that  are  true  in  that  sense.  The  first  step  is  to  define  a  new  notion  of  semantic  proof 
that  admits  a  rational  version  of  the  proof  of  Theorem  4.1. 

Definition  C.l  A  rational  semantic  proof  of  ( P  <  0)  in  context  A  is  a  semantic  proof  of  ( qP  <  0)  in 
A  for  some  positive  integer  q. 

Lemma  C.l  There  exists  a  rational  semantic  proof  of  [i  <  it]  in  context  A  if  and  only  if  there  is  a  rational 
solution  to  the  characteristic  inequalities  of  A  b  t  <  u  true. 

Proof  Sketch:  Since  the  left-hand  sides  of  those  inequalities  are  homogeneous  and  the  right-hand 
sides  consist  of  the  coefficients  in  P,  scaling  any  solution  to  the  inequalities  for  P  by  an  integer  q 
gives  a  solution  to  the  inequalities  for  qP. 

End  of  Sketch. 

Lemma  C.2  If  there  exist  rational  semantic  proofs  of  (P  <  0)  and  (Q  <  0)  in  context  A,  then  there  exists 
a  rational  semantic  proof  of  (P  +  Q  <  0)  in  A. 

Proof:  Suppose  A  b  M  :  pP  <  0  and  A  b  N  :  qQ  <  0  where  p  and  q  are  positive  integers. 
If  M  =  (A,  F),  then  define  qM  to  consist  of  the  multiset  qA  containing  q  copies  of  A  and  the 
polynomial  qF,  and  define  pN  similarly.  Then  A  b  qM  :  pqP  <  0  and  A  b  pN  :  pqQ  <  0.  By 
Lemma  4.12,  there  is  a  semantic  proof  of  pq(P  +  Q)  <  0  in  A,  which  is  a  rational  semantic  proof  of 
{P  +  Q)<  0. 

End  of  Proof. 


C.4  Augmented  Syntactic  Proof  System 

Finally,  I  modify  the  Talt-R  constraint  logic  to  correspond  to  this  new  semantic  proof  theory  by 
adding  one  additional  rule  schema: 

n  n 

A  b+  t  +  •  •  •  +  t  <  u  +  —  ■  +  u  true  (n  G  {1,  2, . . .}) 

A  b+t  <  u  true 

(I  distinguish  the  augmented  system  from  the  original  by  writing  b+  in  place  of  b.) 

The  new  rule  essentially  allows  for  the  high-school  algebra  operation  of  dividing  both  sides 
of  an  inequality  by  a  constant  positive  integer,  provided  this  does  not  produce  any  nonintegral 
coefficients  on  either  side.  Of  course,  there  is  no  multiplication  in  the  constraint  term  language, 
so  the  "division"  is  really  the  removal  of  repeated  addition.  Together  with  the  ability  to  add 
inequalities  (the  monotonicity  rule),  this  allows  hypotheses  in  a  proof  to  be  scaled  by  any  rational 
factor  so  long  as  the  resulting  formula  is  expressible  using  integer  coefficients.  The  augmented 
system  of  syntactic  proof  rules  is  equivalent  to  the  semantic  proof  theory  just  defined. 
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Lemma  C.3  For  any  terms  t.  and  u  and  positive  integer  n, 

[f  H - +t.  <u-\ - b  «]  =  n[t  <  w] . 

s - V - y  s - V - ^ 

n  n 

Proof:  Omitted. 

Lemma  C.4  (Soundness  of  Rational  Semantic  Proof)  If  there  is  a  rational  semantic  proof  of  ft  <  uj 

in  context  A,  then  A  h+t  <  u  true. 

Proof:  Suppose  A  |=  M  :  qft  <  it].  By  Lemma  C.3,  A  | =  M  :  [f  +  •  •  •  +  t  <  u  +  •  •  •  +  «],  where 
each  sum  has  q  copies  of  the  term.  By  Lemma  4.11,  A  h  t  +  •  •  •  +  t  <  u  +  •  •  •  +  u  true  and  thus 
Ah+t  +  --  -+  f<u  +  --  -  +  u  true.  By  the  new  rule,  A  h+t  <  it  true. 

End  of  Proof. 

Lemma  C.5  (Completeness  of  Rational  Semantic  Proof)  If  A  H~f  <  u  true,  then  there  is  a  rational 
semantic  proof  M  of  ft  <  it]  in  context  A. 

Proof  Sketch:  Analogous  to  Lemma  4.13,  using  Lemma  C.2  in  place  of  Lemma  4.12,  and  with  one 
new  case. 

Case: 

n  n 

A  h+  t  +  ■  ■  ■  +  t  <  it  +  •  -  +  it  true  (n  G  {1,  2, . . .}) 

A  h+t.  <  it  true 

By  the  induction  hypothesis,  A  | =  M  :  qft.  +  •  •  •  +  t  <  u  +  ■  ■  ■  +  it]. 

By  Lemma  C.3,  A  |=  M  :  qnft  <  it]. 

Thus  ft  <  it]  is  rationally  semantically  provable,  as  desired. 

End  of  Sketch. 

Theorem  C.3  (Characterization  of  Augmented  System) 

1.  It  is  decidable  whether  or  not  A  H"<p  true. 

2.  If  A  h+p  true,  then  A  h  true  is  valid  over  Q. 

3.  If  A  is  consistent  over  Q  and  Ah  <p  true  is  valid  over  Q,  then  A  h+p  true. 

Proof: 

1.  By  Lemmas  C.l,  C.4  and  C.5,  it  suffices  to  decide  whether  there  is  a  rational  solution  to  the 
characteristic  inequalities.  This  can  be  accomplished  using  any  linear  programming  algo¬ 
rithm. 

2.  Suppose  A  b+p  true.  By  Lemma  C.5,  there  is  a  rational  semantic  proof  of  [<p]  in  A.  By 
Lemma  C.l,  there  is  a  rational  solution  to  the  characteristic  inequalities.  By  Theorem  C.2, 
either  Ah  ip  true  is  valid  over  Q  or  A  is  inconsistent  over  O.  But  if  A  has  no  rational  model, 
then  A  h  ip  true  is  vacuously  valid  over  Q. 

3.  Suppose  A  is  consistent  over  Q  and  A  h  true  is  valid  over  Q.  By  Theorem  C.2,  there  is  a 
rational  solution  to  the  characteristic  inequalities;  by  Lemma  C.l,  there  is  a  rational  semantic 
proof  of  [<p].  By  Lemma  C.4,  A  h+<p  true. 

End  of  Proof. 


Appendix  D 

Typing  Rules  for  Lilt 


A  b  <F  A  b  A  AbT  AbH 


A  b  Ti  :  T  for  1  <  i  <  n  A  b  7j  btype  for  1  <  i  <  n 
A  b  (fl'.Ti,  .  .  .  ,  fn'Tn)  A  b  (eZZi:7i,  .  .  .  ,in'7n) 


A  b  Ti  :T  for  1  <  i  <  n 
A  b  [si:ri, . . .  ,sn:Tn] 


_  AbS  A  b  r 

Ab-  Ab»,r 


a  b  r  <  r' 

A  b  •  handles  F  Ab  (E,  T')  handles  T 


A  b  S  handles  T 


Abri<T2  A  b  Ti  <  r2  A  b  Si  <  S2 


A  b  n  =  r2  :  T  A  b  t\  :  T  A  b  r2  =  ns  :  T 
A  b  T\  <  r2  A  b  ti  <  r2 


A  b  Ti  <  r/  for  1  <  i  <  n 
A  b  ,  sn\Tn]  <  [si:t{,  . . . ,  sn\T'n} 


A  b  Si  <  S2  A  b  r2  <  Ti 
Ab-<-  Ab(s1,r1)<(s2,r2) 


A  b  c  :  k 


(( a:k )  £  A) 

Aba:/c  Abns:T  Ab  int  :  T  A  b  bool  :  T  A  b  unit  :  T 


A  b  n  :  T  for  1  <  i  <  k 
A  b  (n,  •  •  •  ,rfe)  :  T 


{ij  7^  for  j  ^  fc) 

A  b  Ti  :  T  for  1  <  i  <  k 

A  b  [iy.Ti, . . . ,  in'Tn]  :  T 


A  b  r  :  T 

A  b  Tj  :  T  for  1  <  i  <  n 
A  b  (n,. . .  ,rn)  ->•  r  :  T 
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A  b  t  :  T 
A  b  r  array  :  T 


A,  a:T  hr:T 
A  b  jia.T  :  T 


A,  cx\.k\, • . . ,  oin.kn  b  t  .  T 
A  b  Vai:/ci, . . . ,  an:kn.T  :  T 


A,cti:fci, . . .  ,an-kn  b  t  :  T 
A  b  EtaiA’i,  .  .  .  ,  OLn'-kn-T  :  T 


A,  a:/ci  be:  ^ 

A  b  \ar.k\.c  :  /c i  — >  Ay 


A  b  ci  :  k2  —■ >  &  A  b  C2  : 
A  b  ci  C2  :  & 


A  b  7  btype 

(dom(A)  n  dom(A')  =  0) 

(dom(A)  (~l  dom(A')  =  0) 

(A,A')bS  (A,A')br 

(A,A')bS  (A,A')br 

A  b  lbl( A';  E;  T)  btype 

A  b  hnd( A';  S;  T)  btype 

A  b  ci  =  C2  :  k 

(( a:k )  €  A) 

Aba  =  a:l  Abns  =ns  :T  Ab  int  =  int  :  T  A  b  bool  =  bool  :  T 

A  b  Ti  =  t[\T  for  l  <i  <k 

A  b  unit  =  unit  :  T  A  b  (n, . . . ,  Tfc)  =  (r{ , . . . ,  r^}  :  T 


{ij  7^  ifc  for  y  /c) 

A  b  r,:  =  r?-  :  T  for  1  <  i  <  k 

A  b  [n:ris . .  .,in-Tn]  =  [h‘-T\,  ■  ■  ■  An-r'n\  :  T 


A  b  t  =  t'  :  T 


A  b  r  =  t'  :  T 
A  b  r*  =  r?-  :  T  for  1  <  i  <  n 

A  b  (n, . . . ,  Tn)  ->  t  =  (t{ , . . . ,  T'n)  t'  :  T 
A,  a:T  \~  t  =  t'  :  T 


A  b  t  array  =  t'  array  :  T  A  b  jia.T  =  iia.T1  :  T 


_ A,ai:fci, . . . ,  Qin'-kn  b  t  =  t'  :  T _ 

A  b  Yar.ki, . . .  ,an:kn.T  =  Vay.ki, . .  ,,an:kn.T'  :  T 


A,  a:k\  b  c  =  d  :  Ay 
A  b  Act:  Ay  .c  =  \a:k\.c'  :  Ay  — >  Ay 


A  b  c2  =  4  :  Ay 

A,  ay:Ay , . . . ,  an:kn  b  r  =  t'  :  T  A  b  ci  =  c',  :  Ay  —  A' 

A  b  Bay:  Ay, . . .  ,an:kn.T  =  3cty:Ay, . . .  ,an:kn.r'  :  T  A  b  c\  C2  =  dx  c'2  :  A; 


A,  a:Ay  b  ci  :  k  A  b  C2  : 

A  b  (Aa:Ay.ci)  cy  =  ci[c2/a]  :  k 


A  b  g  :  n  =$■  T2 


A  b  Ci  :  ki  for  1  <  i  <  n 

A  b  id  :  T  ^  r  Ab  [ci, . . .  ,cn\  :  Va^Ay, . .  .,an:kn.r  =>  t[c  1, . . . ,cn/ai , . . .  ,an\ 
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Ah  r  =  na.T1  :  T 
A  h  rollT  :  t'[t/o\  =t  r 


A  h  /. ia.T  :  T 

A  h  unroll  :  pa.r  =4>  r^a.r/a] 


A  h  t  =  3ai:fci, . . . ,  an:kn.Tr  :  T  Ah  Cj  :  ki  for  1  <  i  <  n 
A  h  pack[r,ci, . . . ,  cn]  :  r'[ci, . . . ,  Cn/ai, . . . ,  an]  =>  r 


A  h  q  :  t[  =4-  t-2 
Ah  T*  =  r(  :  T  for  i  =  1,2 
A  h  q  :  Ti  =>•  r2 


$;  A;  T  h  r  :  r 

(r(s)  =  r)  _  _  _ 

<&;  A;  r  h  .s  :  r  $;A;Thn:  int  <1>;  A;  T  h  tt  :  bool  A;  T  h  f  f  :  bool 

_  ($(/)  =  t)  A;T  h  v  :  r2  A  h  q  :  r2  =4>  r 

<5;  A;  r  h  *  :  unit  $;A;fh  /  :r  $;  A;f  h  q@v  :  r 


$;  A;  r  h  r  :  t'  Ah  t'  =  t 
$;  A;  T  h  r  :  r 


<f>;  A;  T  h  Vi  :  r,  for  0  <  i  <  k 
$;  A;  T  h  (r0,  ■■■,vk)  :  (r0, . . . ,  rfe) 


(op  :  (n, . . . ,  rfc)  -»  r)  <f>;  A; £  h  Vj  :  n  for  1  <  i  <  k 
‘f;  A;Th  op(oi, . . .  ,vk)  :  r 

A;  T  h  v  :  (to,  . . . ,  Tk)  <!>;  A;  T  h  Vi  :  r  for  1  <  i  <  n 
A;T  h  7 TiV  \Ti  <f>;  A;  r  h  {m,  ...,vn}:T  array 


A  h  T  =  [.  .  .  JlTj,..  .] 

$;A;rh  v.Tj  $;A;rhr[i:r] 

A;T  h  injT(j»  :  r  $;  A;T  h  outj(r)  :  r 


$;  A;  T  h  cond  cond 

<!>;  A;  r  h  r,  :  int  for  i  =  1,2 
$;  A;  T  h  v\  =  V'2  cond 


<f>;  A;  r  h  Vi  :  int  for  i  =  1,2 
$;  A;  T  h  ui  <  V2  cond 


$;  A;  A; -;r;r  h  e 


<1>;  A;  T  h  v  :  r  A;  T  h  v  :  rex  n  AhS  handles  Y 

$;  A;  A;  E;  T;  r  h  return  v  A;  A;  E;  T;  r  h  raise  v 

(A(f)  =  lbl(a1:k1,...,an:kn-~'-T')) 

Ah  Ci'.ki  A  h  r  <  T'{c/a]  Ah2<h[c/a]  $;A;rhr:r'  3>;  A;  A;  E;  T[s  i— ►  t'\,  t  h  e 

A;A;S;T;r  h  goto  t\c  i, . . . ,  cn]  <1>;  A;A;S;T;r  h  let  s  =  r  in  e 

<3?;  A;  T  h  v  :  (t[  . . . ,  r'n)  — ■>  r"  AhS  handles  T 
Ah  Vi  :  t[  for  1  <  i  <  n  3>;  A;  A;  E;  T[s  i— >  r"];  r  h  e 

<3>;  A;  A;  S;  T;  r  h  let  s  =  v(vi, . . . ,  vn)  in  e 
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<J>;  A;  T  I-  v  :  (r0, . . . ,  rm)  <5;  A;  T  b  v  :  t'  array  <£;  A;  T  I-  v'  :  int 

<f>;  A;  T  b  u  :  t*  <J>;  A;  A;  S;  T;  r  b  e  A  b  E  handles  T  A;  A;  S;  T[s  i— ►  t']\ t  b  e 

<J>;  A;  A;  S;  T;  r  I-  let  7T*  v  :=  v'  in  e  <5;  A;  A;  S;  T;  r  b  let  s  =  sub(r,  v ')  in  e 

<5;  A;  T  b  v\  :  t'  array  <£>;  A;  T  b  u2  :  int 
$;  A;T  h  U3  :  r'  A  b  S  handles  T  $;  A;  A;  S;  T;  r  b  e 

<5;  A;  A;  S;  T;  r  b  let  sub(ri ,  i>2)  :=  ^3  in  e 
3>;  A;r  b  V  :  {JyF,i:T',j:T  ] 

A;  A;E;T[s  [i:r']];rh  ei  $;  A;  A;E;T[s  i->  \Jif,  r  b  e2 
$;A;A;H;r  ;  r  b  case  v  of  inj  (z,  s)  =t  ei  else  e2 

A;T  b  v  :  . .  .,an\kn.P  {A.ai.ki, . .  .,an:kn);  A;  E;  £|s  i-»  t'];t  b  e 

<!>;  A;  A;  S;  T;  r  b  let(ai, . . . ,  an,  s)  =  unpack  u  in  e 

<£>;  A;  T  b  cond  cond 

A;  A;  E;T;r  b  ey  A;  A;  S;  T;  r  b  e2  <f>;  A;  A;  E;  T;  r  I-  e 

$;A;A;H;r  ;  r  b  if  cond  then  ei  else  e2  A;  A;  (S,  T');  T;  t  I-  pophandler  in  e 

(A(£)  =  hnd(a±:ki, ... ,  an:kn;  E';  T')) 

A  b  Cj  :  ki  A  h  3  <  S'[c/a]  <&;  A;  A;  (S,  T’[c/a\)\ T;  r  b  e 

<J>;A;A;E;r;T  b  pushhandler  £[ci , . . . ,  cn]  in  e 


$;  A;  A;  r  b  B  :  7 

A,  A'  b  E  A,A'bE  A,A'br 

A,  A'  b  T  <f>;  (A,  A');  A;E;T;r  b  e  (A,  A');  A;  S;  T[s  Texn];  r  b  e 

<]?;  A;  A;  r  b  block(A';  S;  T).e  :  lbl( A';  E;  T)  <£>;  A;  A;  r  b  hndl(A';  E;  T;  s).e  :  hnd( A';  E;  T) 


$  b  F  :  t 

b  A  A  b  rarg  A  b  T  :T  A  b  A 
$;  A;  A;  •;  T;  r  b  e  A;A;r  b  Bj  :  A(£j)  for  1  <  i  <  m 

$  b  func(A;  rarg;  r).(enter(si, . . .  ,sn).e,£  1  =  Bi  j  •  •  •  j  —  Bm)  :  VA.(ri, . . . ,  tp)  *  r 

where 

harg  =  [Si:Ti,  .  .  .  ,  Sp:Tp] 

r  =  [s'tri, . . . ,  s’p.Tp ,  Si:ns, . . . ,  s„:ns] 

each  Bi  is  either  block(Aj;  E* ;  Tj).e  or  hndl(A*;  E,.;  r*;  s).e,  and 
dom(Tj)  =  dom(r)  for  each  i 


b  P 

b  $  <S>  b  Fi  for  1  <  i  <  n  (dom($)  =  {/1, 

L  f1=F1,...Jn  =  Fn 
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